You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by 罗 辉 <lu...@zetyun.com> on 2018/03/09 02:57:07 UTC

答复: Status update: task 1 is in state TASK_ERROR

yes,I modified my code like below:

  def acknowledgeTaskMessage(taskStatus: TaskStatus): String = {
    taskStatus.getMessage
  }
def update(mesos: Mesos, status: TaskStatus) = {
    val message = acknowledgeTaskMessage(status)
    println("The message of current task is :" + message)
    println("Status update: task " + status.getTaskId().getValue() + " is in state " + status.getState().getValueDescriptor().getName())

......

And I got below log as attched file line 231:
231 Received an UPDATE event
232 The message of current task is :Total resources cpus(allocated: controller):6; mem(allocated: controller):8000 required by task and     its executor is more than available cpus(allocated: controller)(reservations: [(STATIC,controller)]):6; mem(allocated: controller)    (reservations: [(STATIC,controller)]):8000; disk(allocated: controller)(reservations: [(STATIC,controller)]):550264; ports(allocate    d: controller):[31000-32000]
233 Status update: task 1 is in state TASK_ERROR



罗辉

基础架构

________________________________
发件人: Benjamin Mahler <bm...@apache.org>
发送时间: 2018年3月9日 9:24:37
收件人: user
主题: Re: Status update: task 1 is in state TASK_ERROR

Can you log the message provided in the TaskStatus?

https://github.com/apache/mesos/blob/1.5.0/include/mesos/v1/mesos.proto#L2424

On Wed, Mar 7, 2018 at 11:23 PM, 罗 辉 <lu...@zetyun.com>> wrote:

Hi guys:

    I got a mesos test app, mostly likely

https://github.com/apache/mesos/blob/master/src/java/src/org/apache/mesos/v1/scheduler/V1Mesos.java

just to run a simple task "free -m". The app can not run the task successfully, always got a log info :

Received an UPDATE event
Status update: task 1 is in state TASK_ERROR


    I checked the logs , but no Errors  in the mesos-master.ERROR or mesos-agent.ERROR, only in mesos-master.INFO shows :

W0307 17:55:28.180716 29438 validation.cpp:1298] Executor 'default' for task '1' uses less CPUs (None) than the minimum required (0.01). Please update your executor, as this will be mandatory in future releases.
W0307 17:55:28.180766 29438 validation.cpp:1310] Executor 'default' for task '1' uses less memory (None) than the minimum required (32MB). Please update your executor, as this will be mandatory in future releases.
      Following this log, I didn't find a way to set the executor's resource or similar code example

      Why my little app always fails? Thanks for any ideas.



San


Re: 答复: 答复: Status update: task 1 is in state TASK_ERROR

Posted by Benjamin Mahler <bm...@apache.org>.
What kind of tasks are you trying to run?

If you want to run commands or containers, you can just use the built-in
DEFAULT executor:
https://github.com/apache/mesos/blob/1.5.0/include/mesos/v1/mesos.proto#L713-L725

If you need a custom executor because your tasks are not commands or
containers, then you can implement your own custom executor:
https://github.com/apache/mesos/blob/1.5.0/include/mesos/v1/mesos.proto#L727-L730

In the latter case, you will have to implement your own executor or use an
existing third party executor. If implementing your own, you need to speak
the v1 protocol to the agent. We maintain a listing of known executor API
libraries here:
http://mesos.apache.org/documentation/latest/api-client-libraries/#executor-api

On Thu, Mar 15, 2018 at 2:32 AM, 罗 辉 <lu...@zetyun.com> wrote:

> Hi guys:
>
> For more info, my framework app’s log and master/agent logs are attached.
>
> My app fails as the end of log described:
>
> The message of current task is :Executor did not register within 1mins
>
> Status update: task 1 is in state TASK_FAILED
>
> Aborting because task 1 is in unexpected state TASK_FAILED with reason
> 'REASON_EXECUTOR_REGISTRATION_TIMEOUT' from source 'SOURCE_AGENT' with
> message 'Executor did not register within 1mins'
>
>
>
> My opinion about this failure:
>
> 1.I guess there should be an V1 version executor class , with a register
> method to register the executor onto the agent?
>
> 2.I studied V0’s executor implementation and tried to implement a V1
> version executor ,which supposed to extend from executor interface, and
> implement the abstract methods including register, reregister and etc.
> However I didn’t find the V1 executor interface java API. Does that mean I
> am in the wrong direction?
>
>
>
> In one word, any ideas about the REASON_EXECUTOR_REGISTRATION_TIMEOUT
> failure?
>
>
>
> San
>
>
>
> *发件人:* 罗 辉 <lu...@zetyun.com>
> *发送时间:* 2018年3月14日 15:29
> *收件人:* user <us...@mesos.apache.org>
> *主题:* 答复: 答复: Status update: task 1 is in state TASK_ERROR
>
>
>
> Thanks Benjamin,
>
> I tried to understand the missing reservation metadata and look up
> relative docs about resource reservation, however i didn't find to much
> document about it.
>
> I solved this problem by adding a method like below in my scheduler:
>
>   def luanchtask(offer: Offer, task: TaskInfo): Call = {
>     Call.newBuilder()
>       .setFrameworkId(frameworkId)
>       .setType(Call.Type.ACCEPT)
>       .setAccept(
>         Call.Accept.newBuilder()
>           .addOfferIds(offer.getId)
>           .addOperations(
>             Offer.Operation.newBuilder()
>               .setType(Offer.Operation.Type.LAUNCH)
>               .setLaunch(
>                 Offer.Operation.Launch.newBuilder()
>                   .addTaskInfos(task)))).build()
>   }
>
>
>
> And after that I met another problem: my task is always in staging, and
> terminates after 1min due to timeout. I think there are many mini process
> in a scheduler app including callbacks, such as connect, register, get
> offers list,accpet offer and etc. Is there a detail programming guide in V1
> framework developing?
>
>
>
> Thank you.
>
>
>
>
>
> San
>
>
> ------------------------------
>
> *发件人**:* Benjamin Mahler <bm...@apache.org>
> *发送时间**:* 2018年3月10日 9:00:55
> *收件人**:* user
> *主题**:* Re: 答复: Status update: task 1 is in state TASK_ERROR
>
>
>
> The message clarifies it, the task+executor have some unreserved
> resources:
>
> cpus(allocated: controller):6; mem(allocated: controller):8000
>
>
>
> But the resources offered were reserved:
>
> cpus(allocated: controller)(reservations: [(STATIC,controller)]):6;
> mem(allocated: controller)(reservations: [(STATIC,controller)]):8000; +
> disk + ports
>
>
>
> The scheduler needs to provide resources that are contained in the offer,
> in this case it needs to include the missing reservation metadata.
>
>
>
> On Thu, Mar 8, 2018 at 6:57 PM, 罗 辉 <lu...@zetyun.com> wrote:
>
> yes,I modified my code like below:
>
>   def acknowledgeTaskMessage(taskStatus: TaskStatus): String = {
>     taskStatus.getMessage
>   }
>
> def update(mesos: Mesos, status: TaskStatus) = {
>     val message = acknowledgeTaskMessage(status)
>     println("The message of current task is :" + message)
>     println("Status update: task " + status.getTaskId().getValue() + " is
> in state " + status.getState().getValueDescriptor().getName())
>
>
> ......
>
>
>
> And I got below log as attched file line 231:
>
> 231 Received an UPDATE event
> 232 The message of current task is :Total resources cpus(allocated:
> controller):6; mem(allocated: controller):8000 required by task and     its
> executor is more than available cpus(allocated: controller)(reservations:
> [(STATIC,controller)]):6; mem(allocated: controller)    (reservations:
> [(STATIC,controller)]):8000; disk(allocated: controller)(reservations:
> [(STATIC,controller)]):550264; ports(allocate    d:
> controller):[31000-32000]
> 233 Status update: task 1 is in state TASK_ERROR
>
>
>
>
>
> 罗辉
>
> 基础架构
> ------------------------------
>
> *发件人**:* Benjamin Mahler <bm...@apache.org>
> *发送时间**:* 2018年3月9日 9:24:37
> *收件人**:* user
> *主题**:* Re: Status update: task 1 is in state TASK_ERROR
>
>
>
> Can you log the message provided in the TaskStatus?
>
>
>
> https://github.com/apache/mesos/blob/1.5.0/include/
> mesos/v1/mesos.proto#L2424
>
> [image: 图像已被发件人删除。]
> <https://github.com/apache/mesos/blob/1.5.0/include/mesos/v1/mesos.proto#L2424>
>
> apache/mesos
> <https://github.com/apache/mesos/blob/1.5.0/include/mesos/v1/mesos.proto#L2424>
>
> github.com
>
> mesos - Mirror of Apache Mesos
>
>
>
>
>
> On Wed, Mar 7, 2018 at 11:23 PM, 罗 辉 <lu...@zetyun.com> wrote:
>
> Hi guys:
>
>     I got a mesos test app, mostly likely
>
> https://github.com/apache/mesos/blob/master/src/java/
> src/org/apache/mesos/v1/scheduler/V1Mesos.java
>
> [image: 图像已被发件人删除。]
> <https://github.com/apache/mesos/blob/master/src/java/src/org/apache/mesos/v1/scheduler/V1Mesos.java>
>
> apache/mesos
> <https://github.com/apache/mesos/blob/master/src/java/src/org/apache/mesos/v1/scheduler/V1Mesos.java>
>
> github.com
>
> mesos - Mirror of Apache Mesos
>
>
>
> just to run a simple task "free -m". The app can not run the task
> successfully, always got a log info :
>
> Received an UPDATE event
> Status update: task 1 is in state TASK_ERROR
>
>
>
>     I checked the logs , but no Errors  in the mesos-master.ERROR or
> mesos-agent.ERROR, only in mesos-master.INFO shows :
>
> W0307 17:55:28.180716 29438 validation.cpp:1298] Executor 'default' for
> task '1' uses less CPUs (None) than the minimum required (0.01). Please
> update your executor, as this will be mandatory in future releases.
> W0307 17:55:28.180766 29438 validation.cpp:1310] Executor 'default' for
> task '1' uses less memory (None) than the minimum required (32MB). Please
> update your executor, as this will be mandatory in future releases.
>
>       Following this log, I didn't find a way to set the executor's
> resource or similar code example
>
>
>
>       Why my little app always fails? Thanks for any ideas.
>
>
>
>
>
> San
>
>
>
>
>

答复: 答复: Status update: task 1 is in state TASK_ERROR

Posted by 罗 辉 <lu...@zetyun.com>.
Hi guys:
For more info, my framework app’s log and master/agent logs are attached.
My app fails as the end of log described:
The message of current task is :Executor did not register within 1mins
Status update: task 1 is in state TASK_FAILED
Aborting because task 1 is in unexpected state TASK_FAILED with reason 'REASON_EXECUTOR_REGISTRATION_TIMEOUT' from source 'SOURCE_AGENT' with message 'Executor did not register within 1mins'

My opinion about this failure:
1.I guess there should be an V1 version executor class , with a register method to register the executor onto the agent?
2.I studied V0’s executor implementation and tried to implement a V1 version executor ,which supposed to extend from executor interface, and implement the abstract methods including register, reregister and etc. However I didn’t find the V1 executor interface java API. Does that mean I am in the wrong direction?

In one word, any ideas about the REASON_EXECUTOR_REGISTRATION_TIMEOUT failure?

San

发件人: 罗 辉 <lu...@zetyun.com>
发送时间: 2018年3月14日 15:29
收件人: user <us...@mesos.apache.org>
主题: 答复: 答复: Status update: task 1 is in state TASK_ERROR


Thanks Benjamin,

I tried to understand the missing reservation metadata and look up relative docs about resource reservation, however i didn't find to much document about it.

I solved this problem by adding a method like below in my scheduler:
  def luanchtask(offer: Offer, task: TaskInfo): Call = {
    Call.newBuilder()
      .setFrameworkId(frameworkId)
      .setType(Call.Type.ACCEPT)
      .setAccept(
        Call.Accept.newBuilder()
          .addOfferIds(offer.getId)
          .addOperations(
            Offer.Operation.newBuilder()
              .setType(Offer.Operation.Type.LAUNCH)
              .setLaunch(
                Offer.Operation.Launch.newBuilder()
                  .addTaskInfos(task)))).build()
  }

And after that I met another problem: my task is always in staging, and terminates after 1min due to timeout. I think there are many mini process in a scheduler app including callbacks, such as connect, register, get offers list,accpet offer and etc. Is there a detail programming guide in V1 framework developing?

Thank you.




San



________________________________
发件人: Benjamin Mahler <bm...@apache.org>
发送时间: 2018年3月10日 9:00:55
收件人: user
主题: Re: 答复: Status update: task 1 is in state TASK_ERROR

The message clarifies it, the task+executor have some unreserved resources:
cpus(allocated: controller):6; mem(allocated: controller):8000

But the resources offered were reserved:
cpus(allocated: controller)(reservations: [(STATIC,controller)]):6; mem(allocated: controller)(reservations: [(STATIC,controller)]):8000; + disk + ports

The scheduler needs to provide resources that are contained in the offer, in this case it needs to include the missing reservation metadata.

On Thu, Mar 8, 2018 at 6:57 PM, 罗 辉 <lu...@zetyun.com>> wrote:

yes,I modified my code like below:
  def acknowledgeTaskMessage(taskStatus: TaskStatus): String = {
    taskStatus.getMessage
  }
def update(mesos: Mesos, status: TaskStatus) = {
    val message = acknowledgeTaskMessage(status)
    println("The message of current task is :" + message)
    println("Status update: task " + status.getTaskId().getValue() + " is in state " + status.getState().getValueDescriptor().getName())

......

And I got below log as attched file line 231:
231 Received an UPDATE event
232 The message of current task is :Total resources cpus(allocated: controller):6; mem(allocated: controller):8000 required by task and     its executor is more than available cpus(allocated: controller)(reservations: [(STATIC,controller)]):6; mem(allocated: controller)    (reservations: [(STATIC,controller)]):8000; disk(allocated: controller)(reservations: [(STATIC,controller)]):550264; ports(allocate    d: controller):[31000-32000]
233 Status update: task 1 is in state TASK_ERROR





罗辉

基础架构

________________________________
发件人: Benjamin Mahler <bm...@apache.org>>
发送时间: 2018年3月9日 9:24:37
收件人: user
主题: Re: Status update: task 1 is in state TASK_ERROR

Can you log the message provided in the TaskStatus?

https://github.com/apache/mesos/blob/1.5.0/include/mesos/v1/mesos.proto#L2424
[图像已被发件人删除。]<https://github.com/apache/mesos/blob/1.5.0/include/mesos/v1/mesos.proto#L2424>

apache/mesos<https://github.com/apache/mesos/blob/1.5.0/include/mesos/v1/mesos.proto#L2424>
github.com
mesos - Mirror of Apache Mesos



On Wed, Mar 7, 2018 at 11:23 PM, 罗 辉 <lu...@zetyun.com>> wrote:

Hi guys:

    I got a mesos test app, mostly likely

https://github.com/apache/mesos/blob/master/src/java/src/org/apache/mesos/v1/scheduler/V1Mesos.java
[图像已被发件人删除。]<https://github.com/apache/mesos/blob/master/src/java/src/org/apache/mesos/v1/scheduler/V1Mesos.java>

apache/mesos<https://github.com/apache/mesos/blob/master/src/java/src/org/apache/mesos/v1/scheduler/V1Mesos.java>
github.com
mesos - Mirror of Apache Mesos



just to run a simple task "free -m". The app can not run the task successfully, always got a log info :

Received an UPDATE event
Status update: task 1 is in state TASK_ERROR



    I checked the logs , but no Errors  in the mesos-master.ERROR or mesos-agent.ERROR, only in mesos-master.INFO shows :
W0307 17:55:28.180716 29438 validation.cpp:1298] Executor 'default' for task '1' uses less CPUs (None) than the minimum required (0.01). Please update your executor, as this will be mandatory in future releases.
W0307 17:55:28.180766 29438 validation.cpp:1310] Executor 'default' for task '1' uses less memory (None) than the minimum required (32MB). Please update your executor, as this will be mandatory in future releases.
      Following this log, I didn't find a way to set the executor's resource or similar code example

      Why my little app always fails? Thanks for any ideas.




San



答复: 答复: Status update: task 1 is in state TASK_ERROR

Posted by 罗 辉 <lu...@zetyun.com>.
Thanks Benjamin,

I tried to understand the missing reservation metadata and look up relative docs about resource reservation, however i didn't find to much document about it.

I solved this problem by adding a method like below in my scheduler:

  def luanchtask(offer: Offer, task: TaskInfo): Call = {
    Call.newBuilder()
      .setFrameworkId(frameworkId)
      .setType(Call.Type.ACCEPT)
      .setAccept(
        Call.Accept.newBuilder()
          .addOfferIds(offer.getId)
          .addOperations(
            Offer.Operation.newBuilder()
              .setType(Offer.Operation.Type.LAUNCH)
              .setLaunch(
                Offer.Operation.Launch.newBuilder()
                  .addTaskInfos(task)))).build()
  }

And after that I met another problem: my task is always in staging, and terminates after 1min due to timeout. I think there are many mini process in a scheduler app including callbacks, such as connect, register, get offers list,accpet offer and etc. Is there a detail programming guide in V1 framework developing?

Thank you.



San


________________________________
发件人: Benjamin Mahler <bm...@apache.org>
发送时间: 2018年3月10日 9:00:55
收件人: user
主题: Re: 答复: Status update: task 1 is in state TASK_ERROR

The message clarifies it, the task+executor have some unreserved resources:
cpus(allocated: controller):6; mem(allocated: controller):8000

But the resources offered were reserved:
cpus(allocated: controller)(reservations: [(STATIC,controller)]):6; mem(allocated: controller)(reservations: [(STATIC,controller)]):8000; + disk + ports

The scheduler needs to provide resources that are contained in the offer, in this case it needs to include the missing reservation metadata.

On Thu, Mar 8, 2018 at 6:57 PM, 罗 辉 <lu...@zetyun.com>> wrote:

yes,I modified my code like below:

  def acknowledgeTaskMessage(taskStatus: TaskStatus): String = {
    taskStatus.getMessage
  }
def update(mesos: Mesos, status: TaskStatus) = {
    val message = acknowledgeTaskMessage(status)
    println("The message of current task is :" + message)
    println("Status update: task " + status.getTaskId().getValue() + " is in state " + status.getState().getValueDescriptor().getName())

......

And I got below log as attched file line 231:
231 Received an UPDATE event
232 The message of current task is :Total resources cpus(allocated: controller):6; mem(allocated: controller):8000 required by task and     its executor is more than available cpus(allocated: controller)(reservations: [(STATIC,controller)]):6; mem(allocated: controller)    (reservations: [(STATIC,controller)]):8000; disk(allocated: controller)(reservations: [(STATIC,controller)]):550264; ports(allocate    d: controller):[31000-32000]
233 Status update: task 1 is in state TASK_ERROR



罗辉

基础架构

________________________________
发件人: Benjamin Mahler <bm...@apache.org>>
发送时间: 2018年3月9日 9:24:37
收件人: user
主题: Re: Status update: task 1 is in state TASK_ERROR

Can you log the message provided in the TaskStatus?

https://github.com/apache/mesos/blob/1.5.0/include/mesos/v1/mesos.proto#L2424
[https://avatars3.githubusercontent.com/u/47359?s=400&v=4]<https://github.com/apache/mesos/blob/1.5.0/include/mesos/v1/mesos.proto#L2424>

apache/mesos<https://github.com/apache/mesos/blob/1.5.0/include/mesos/v1/mesos.proto#L2424>
github.com
mesos - Mirror of Apache Mesos




On Wed, Mar 7, 2018 at 11:23 PM, 罗 辉 <lu...@zetyun.com>> wrote:

Hi guys:

    I got a mesos test app, mostly likely

https://github.com/apache/mesos/blob/master/src/java/src/org/apache/mesos/v1/scheduler/V1Mesos.java

[https://avatars3.githubusercontent.com/u/47359?s=400&v=4]<https://github.com/apache/mesos/blob/master/src/java/src/org/apache/mesos/v1/scheduler/V1Mesos.java>

apache/mesos<https://github.com/apache/mesos/blob/master/src/java/src/org/apache/mesos/v1/scheduler/V1Mesos.java>
github.com
mesos - Mirror of Apache Mesos



just to run a simple task "free -m". The app can not run the task successfully, always got a log info :

Received an UPDATE event
Status update: task 1 is in state TASK_ERROR


    I checked the logs , but no Errors  in the mesos-master.ERROR or mesos-agent.ERROR, only in mesos-master.INFO shows :

W0307 17:55:28.180716 29438 validation.cpp:1298] Executor 'default' for task '1' uses less CPUs (None) than the minimum required (0.01). Please update your executor, as this will be mandatory in future releases.
W0307 17:55:28.180766 29438 validation.cpp:1310] Executor 'default' for task '1' uses less memory (None) than the minimum required (32MB). Please update your executor, as this will be mandatory in future releases.
      Following this log, I didn't find a way to set the executor's resource or similar code example

      Why my little app always fails? Thanks for any ideas.



San



Re: 答复: Status update: task 1 is in state TASK_ERROR

Posted by Benjamin Mahler <bm...@apache.org>.
The message clarifies it, the task+executor have some unreserved resources:
cpus(allocated: controller):6; mem(allocated: controller):8000

But the resources offered were reserved:
cpus(allocated: controller)(reservations: [(STATIC,controller)]):6;
mem(allocated: controller)(reservations: [(STATIC,controller)]):8000; +
disk + ports

The scheduler needs to provide resources that are contained in the offer,
in this case it needs to include the missing reservation metadata.

On Thu, Mar 8, 2018 at 6:57 PM, 罗 辉 <lu...@zetyun.com> wrote:

> yes,I modified my code like below:
>
>   def acknowledgeTaskMessage(taskStatus: TaskStatus): String = {
>     taskStatus.getMessage
>   }
> def update(mesos: Mesos, status: TaskStatus) = {
>     val message = acknowledgeTaskMessage(status)
>     println("The message of current task is :" + message)
>     println("Status update: task " + status.getTaskId().getValue() + " is
> in state " + status.getState().getValueDescriptor().getName())
>
> ......
>
> And I got below log as attched file line 231:
> 231 Received an UPDATE event
> 232 The message of current task is :Total resources cpus(allocated:
> controller):6; mem(allocated: controller):8000 required by task and     its
> executor is more than available cpus(allocated: controller)(reservations:
> [(STATIC,controller)]):6; mem(allocated: controller)    (reservations:
> [(STATIC,controller)]):8000; disk(allocated: controller)(reservations:
> [(STATIC,controller)]):550264; ports(allocate    d:
> controller):[31000-32000]
> 233 Status update: task 1 is in state TASK_ERROR
>
>
>
> 罗辉
>
> 基础架构
> ------------------------------
> *发件人:* Benjamin Mahler <bm...@apache.org>
> *发送时间:* 2018年3月9日 9:24:37
> *收件人:* user
> *主题:* Re: Status update: task 1 is in state TASK_ERROR
>
> Can you log the message provided in the TaskStatus?
>
> https://github.com/apache/mesos/blob/1.5.0/include/mesos/v1/
> mesos.proto#L2424
>
> On Wed, Mar 7, 2018 at 11:23 PM, 罗 辉 <lu...@zetyun.com> wrote:
>
> Hi guys:
>
>     I got a mesos test app, mostly likely
>
> https://github.com/apache/mesos/blob/master/src/java/src/org
> /apache/mesos/v1/scheduler/V1Mesos.java
>
> just to run a simple task "free -m". The app can not run the task
> successfully, always got a log info :
>
> Received an UPDATE event
> Status update: task 1 is in state TASK_ERROR
>
>
>     I checked the logs , but no Errors  in the mesos-master.ERROR or
> mesos-agent.ERROR, only in mesos-master.INFO shows :
>
> W0307 17:55:28.180716 29438 validation.cpp:1298] Executor 'default' for
> task '1' uses less CPUs (None) than the minimum required (0.01). Please
> update your executor, as this will be mandatory in future releases.
> W0307 17:55:28.180766 29438 validation.cpp:1310] Executor 'default' for
> task '1' uses less memory (None) than the minimum required (32MB). Please
> update your executor, as this will be mandatory in future releases.
>       Following this log, I didn't find a way to set the executor's
> resource or similar code example
>
>       Why my little app always fails? Thanks for any ideas.
>
>
> San
>
>
>