You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by Frank Scholten <fr...@frankscholten.nl> on 2016/10/04 08:58:27 UTC

Troubleshooting tasks that are stuck in the 'Staging' state

Hi all,

I am looking for some ways to troubleshoot or debug tasks that are
stuck in the 'staging' state. Typically they have no logs in the
sandbox.

Are there are any endpoints or things to look for in logs to identify
a root cause?

Is there a troubleshooting guide for Mesos to solve problems like this?

Cheers,

Frank

Re: Troubleshooting tasks that are stuck in the 'Staging' state

Posted by haosdent <ha...@gmail.com>.
> How do you typically monitor the messages between Master and Agents?
For my side, I didn't monitor this. And only check the logs when
troubleshooting some problems.
Not sure if other users or developers have tools to meet your requirement
here.

On Wed, Oct 5, 2016 at 8:16 PM, Frank Scholten <fr...@frankscholten.nl>
wrote:

> Ok. How do you typically monitor the messages between Master and
> Agents? Do you have some tools for this on the cluster?
>
> On Tue, Oct 4, 2016 at 6:21 PM, haosdent <ha...@gmail.com> wrote:
> > Hi, @Frank Thanks for your information
> >
> >> I see messages 'Telling agent (...) to kill task (...)'. Why does this
> >> happen?
> > This should because your framework send a `KillTaskMessage` or
> > `scheduler::Call::KILL` request to the Mesos Master, then the Mesos is
> going
> > to kill your task.
> >
> >>Is this the exact text to search for or is this the name of the protobuf
> >> message? Are these logged on a higher log level?
> > it exists in the log of the agents. It looks like
> > ```
> > I1004 23:19:36.175673 45405 slave.cpp:1539] Got assigned task '1' for
> > framework e7287433-36f9-48dd-8633-8a6ac7083a43-0000
> > I1004 23:19:36.176206 45405 slave.cpp:1696] Launching task '1' for
> framework
> > e7287433-36f9-48dd-8633-8a6ac7083a43-0000
> > ```
> > Usually, you could grep your task id in the agent log to see how the task
> > failed.
> >
> >
> >
> > On Tue, Oct 4, 2016 at 8:50 PM, Frank Scholten <fr...@frankscholten.nl>
> > wrote:
> >>
> >> Thanks Haosdent for your quick response.
> >>
> >> I added GLOG_v=1 to the master and agents.
> >>
> >> 1. The framework is registered. Marathon in this case.
> >> 2. I see messages 'Telling agent (...) to kill task (...)'. Why does
> >> this happen? I also see 'Sending explicit reconciliation state
> >> TASK_LOST for task fake-marathon-pacemaker-task-(...)'.
> >> 3. I searched for RunTaskMessage in the agent log but could not find
> >> it. Is this the exact text to search for or is this the name of the
> >> protobuf message? Are these logged on a higher log level?
> >>
> >> On Tue, Oct 4, 2016 at 11:22 AM, haosdent <ha...@gmail.com> wrote:
> >> > staging is the initialize status of the task. I think you may your
> logs
> >> > via
> >> > these steps:
> >> >
> >> > 1. If your framework registered successfully in the master?
> >> > 2. If the master send resources offers to your framework and your
> >> > framework
> >> > accept it?
> >> > 3. If your agents receive the RunTaskMessage from master to launch
> your
> >> > task?
> >> >
> >> > In additionally, use `export GLOG_v=1` before start masters and agents
> >> > may
> >> > helpful for your troubleshooting.
> >> >
> >> > On Tue, Oct 4, 2016 at 4:58 PM, Frank Scholten <
> frank@frankscholten.nl>
> >> > wrote:
> >> >>
> >> >> Hi all,
> >> >>
> >> >> I am looking for some ways to troubleshoot or debug tasks that are
> >> >> stuck in the 'staging' state. Typically they have no logs in the
> >> >> sandbox.
> >> >>
> >> >> Are there are any endpoints or things to look for in logs to identify
> >> >> a root cause?
> >> >>
> >> >> Is there a troubleshooting guide for Mesos to solve problems like
> this?
> >> >>
> >> >> Cheers,
> >> >>
> >> >> Frank
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > Best Regards,
> >> > Haosdent Huang
> >
> >
> >
> >
> > --
> > Best Regards,
> > Haosdent Huang
>



-- 
Best Regards,
Haosdent Huang

Re: Troubleshooting tasks that are stuck in the 'Staging' state

Posted by Frank Scholten <fr...@frankscholten.nl>.
Ok. How do you typically monitor the messages between Master and
Agents? Do you have some tools for this on the cluster?

On Tue, Oct 4, 2016 at 6:21 PM, haosdent <ha...@gmail.com> wrote:
> Hi, @Frank Thanks for your information
>
>> I see messages 'Telling agent (...) to kill task (...)'. Why does this
>> happen?
> This should because your framework send a `KillTaskMessage` or
> `scheduler::Call::KILL` request to the Mesos Master, then the Mesos is going
> to kill your task.
>
>>Is this the exact text to search for or is this the name of the protobuf
>> message? Are these logged on a higher log level?
> it exists in the log of the agents. It looks like
> ```
> I1004 23:19:36.175673 45405 slave.cpp:1539] Got assigned task '1' for
> framework e7287433-36f9-48dd-8633-8a6ac7083a43-0000
> I1004 23:19:36.176206 45405 slave.cpp:1696] Launching task '1' for framework
> e7287433-36f9-48dd-8633-8a6ac7083a43-0000
> ```
> Usually, you could grep your task id in the agent log to see how the task
> failed.
>
>
>
> On Tue, Oct 4, 2016 at 8:50 PM, Frank Scholten <fr...@frankscholten.nl>
> wrote:
>>
>> Thanks Haosdent for your quick response.
>>
>> I added GLOG_v=1 to the master and agents.
>>
>> 1. The framework is registered. Marathon in this case.
>> 2. I see messages 'Telling agent (...) to kill task (...)'. Why does
>> this happen? I also see 'Sending explicit reconciliation state
>> TASK_LOST for task fake-marathon-pacemaker-task-(...)'.
>> 3. I searched for RunTaskMessage in the agent log but could not find
>> it. Is this the exact text to search for or is this the name of the
>> protobuf message? Are these logged on a higher log level?
>>
>> On Tue, Oct 4, 2016 at 11:22 AM, haosdent <ha...@gmail.com> wrote:
>> > staging is the initialize status of the task. I think you may your logs
>> > via
>> > these steps:
>> >
>> > 1. If your framework registered successfully in the master?
>> > 2. If the master send resources offers to your framework and your
>> > framework
>> > accept it?
>> > 3. If your agents receive the RunTaskMessage from master to launch your
>> > task?
>> >
>> > In additionally, use `export GLOG_v=1` before start masters and agents
>> > may
>> > helpful for your troubleshooting.
>> >
>> > On Tue, Oct 4, 2016 at 4:58 PM, Frank Scholten <fr...@frankscholten.nl>
>> > wrote:
>> >>
>> >> Hi all,
>> >>
>> >> I am looking for some ways to troubleshoot or debug tasks that are
>> >> stuck in the 'staging' state. Typically they have no logs in the
>> >> sandbox.
>> >>
>> >> Are there are any endpoints or things to look for in logs to identify
>> >> a root cause?
>> >>
>> >> Is there a troubleshooting guide for Mesos to solve problems like this?
>> >>
>> >> Cheers,
>> >>
>> >> Frank
>> >
>> >
>> >
>> >
>> > --
>> > Best Regards,
>> > Haosdent Huang
>
>
>
>
> --
> Best Regards,
> Haosdent Huang

Re: Troubleshooting tasks that are stuck in the 'Staging' state

Posted by haosdent <ha...@gmail.com>.
Hi, @Frank Thanks for your information

> I see messages 'Telling agent (...) to kill task (...)'. Why does this
happen?
This should because your framework send a `KillTaskMessage` or
`scheduler::Call::KILL` request to the Mesos Master, then the Mesos is
going to kill your task.

>Is this the exact text to search for or is this the name of the protobuf
message? Are these logged on a higher log level?
it exists in the log of the agents. It looks like
```
I1004 23:19:36.175673 45405 slave.cpp:1539] Got assigned task '1' for
framework e7287433-36f9-48dd-8633-8a6ac7083a43-0000
I1004 23:19:36.176206 45405 slave.cpp:1696] Launching task '1' for
framework e7287433-36f9-48dd-8633-8a6ac7083a43-0000
```
Usually, you could grep your task id in the agent log to see how the task
failed.



On Tue, Oct 4, 2016 at 8:50 PM, Frank Scholten <fr...@frankscholten.nl>
wrote:

> Thanks Haosdent for your quick response.
>
> I added GLOG_v=1 to the master and agents.
>
> 1. The framework is registered. Marathon in this case.
> 2. I see messages 'Telling agent (...) to kill task (...)'. Why does
> this happen? I also see 'Sending explicit reconciliation state
> TASK_LOST for task fake-marathon-pacemaker-task-(...)'.
> 3. I searched for RunTaskMessage in the agent log but could not find
> it. Is this the exact text to search for or is this the name of the
> protobuf message? Are these logged on a higher log level?
>
> On Tue, Oct 4, 2016 at 11:22 AM, haosdent <ha...@gmail.com> wrote:
> > staging is the initialize status of the task. I think you may your logs
> via
> > these steps:
> >
> > 1. If your framework registered successfully in the master?
> > 2. If the master send resources offers to your framework and your
> framework
> > accept it?
> > 3. If your agents receive the RunTaskMessage from master to launch your
> > task?
> >
> > In additionally, use `export GLOG_v=1` before start masters and agents
> may
> > helpful for your troubleshooting.
> >
> > On Tue, Oct 4, 2016 at 4:58 PM, Frank Scholten <fr...@frankscholten.nl>
> > wrote:
> >>
> >> Hi all,
> >>
> >> I am looking for some ways to troubleshoot or debug tasks that are
> >> stuck in the 'staging' state. Typically they have no logs in the
> >> sandbox.
> >>
> >> Are there are any endpoints or things to look for in logs to identify
> >> a root cause?
> >>
> >> Is there a troubleshooting guide for Mesos to solve problems like this?
> >>
> >> Cheers,
> >>
> >> Frank
> >
> >
> >
> >
> > --
> > Best Regards,
> > Haosdent Huang
>



-- 
Best Regards,
Haosdent Huang

Re: Troubleshooting tasks that are stuck in the 'Staging' state

Posted by Frank Scholten <fr...@frankscholten.nl>.
Thanks Haosdent for your quick response.

I added GLOG_v=1 to the master and agents.

1. The framework is registered. Marathon in this case.
2. I see messages 'Telling agent (...) to kill task (...)'. Why does
this happen? I also see 'Sending explicit reconciliation state
TASK_LOST for task fake-marathon-pacemaker-task-(...)'.
3. I searched for RunTaskMessage in the agent log but could not find
it. Is this the exact text to search for or is this the name of the
protobuf message? Are these logged on a higher log level?

On Tue, Oct 4, 2016 at 11:22 AM, haosdent <ha...@gmail.com> wrote:
> staging is the initialize status of the task. I think you may your logs via
> these steps:
>
> 1. If your framework registered successfully in the master?
> 2. If the master send resources offers to your framework and your framework
> accept it?
> 3. If your agents receive the RunTaskMessage from master to launch your
> task?
>
> In additionally, use `export GLOG_v=1` before start masters and agents may
> helpful for your troubleshooting.
>
> On Tue, Oct 4, 2016 at 4:58 PM, Frank Scholten <fr...@frankscholten.nl>
> wrote:
>>
>> Hi all,
>>
>> I am looking for some ways to troubleshoot or debug tasks that are
>> stuck in the 'staging' state. Typically they have no logs in the
>> sandbox.
>>
>> Are there are any endpoints or things to look for in logs to identify
>> a root cause?
>>
>> Is there a troubleshooting guide for Mesos to solve problems like this?
>>
>> Cheers,
>>
>> Frank
>
>
>
>
> --
> Best Regards,
> Haosdent Huang

Re: Troubleshooting tasks that are stuck in the 'Staging' state

Posted by haosdent <ha...@gmail.com>.
staging is the initialize status of the task. I think you may your logs via
these steps:

1. If your framework registered successfully in the master?
2. If the master send resources offers to your framework and your framework
accept it?
3. If your agents receive the RunTaskMessage from master to launch your
task?

In additionally, use `export GLOG_v=1` before start masters and agents may
helpful for your troubleshooting.

On Tue, Oct 4, 2016 at 4:58 PM, Frank Scholten <fr...@frankscholten.nl>
wrote:

> Hi all,
>
> I am looking for some ways to troubleshoot or debug tasks that are
> stuck in the 'staging' state. Typically they have no logs in the
> sandbox.
>
> Are there are any endpoints or things to look for in logs to identify
> a root cause?
>
> Is there a troubleshooting guide for Mesos to solve problems like this?
>
> Cheers,
>
> Frank
>



-- 
Best Regards,
Haosdent Huang