You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Vinod Kone (JIRA)" <ji...@apache.org> on 2013/03/14 01:44:13 UTC

[jira] [Resolved] (MESOS-367) Invalid StatusUpdateMessage from missing slave id.

     [ https://issues.apache.org/jira/browse/MESOS-367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kone resolved MESOS-367.
------------------------------

    Resolution: Fixed
    
> Invalid StatusUpdateMessage from missing slave id.
> --------------------------------------------------
>
>                 Key: MESOS-367
>                 URL: https://issues.apache.org/jira/browse/MESOS-367
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Benjamin Mahler
>            Assignee: Vinod Kone
>            Priority: Critical
>
> It looks like the ExecutorProcess sets its internal slaveId upon registration:
>   void registered(const ExecutorInfo& executorInfo,
>                   const FrameworkID& frameworkId,
>                   const FrameworkInfo& frameworkInfo,
>                   const SlaveID& slaveId,
>                   const SlaveInfo& slaveInfo)
>   {
>     if (aborted) {
>       VLOG(1) << "Ignoring registered message from slave " << slaveId
>               << " because the driver is aborted!";
>       return;
>     }
>     VLOG(1) << "Executor registered on slave " << slaveId;
> ****    this->slaveId = slaveId;   ***
>     executor->registered(driver, executorInfo, frameworkInfo, slaveInfo);
>   }
> A result of this is that if the registration is delayed, the executor can come up and send a status update (before the slaveId is set), resulting in an incomplete protobuf:
>   void sendStatusUpdate(const TaskStatus& status)
>   {
>     VLOG(1) << "Executor sending status update for task "
>             << status.task_id() << " in state " << status.state();
>     if (status.state() == TASK_STAGING) {
>       VLOG(1) << "Executor is not allowed to send "
>               << "TASK_STAGING status updates. Aborting!";
>       driver->abort();
>       executor->error(driver, "Attempted to send TASK_STAGING status update");
>       return;
>     }
>     StatusUpdateMessage message;
>     StatusUpdate* update = message.mutable_update();
>     update->mutable_framework_id()->MergeFrom(frameworkId);
>     update->mutable_executor_id()->MergeFrom(executorId);
> ****    update->mutable_slave_id()->MergeFrom(slaveId);   ****
>     update->mutable_status()->MergeFrom(status);
>     update->set_timestamp(Clock::now());
>     update->set_uuid(UUID::random().toBytes());
>     send(slave, message);
>   }
> The ExecutorProcess should take the slaveId in its constructor to avoid this issue.
> Here are the relevant log lines:
> I0227 23:45:56.547392 38406 slave.cpp:762] Got registration for executor 'thermos-1362008747374-wickman-seizure-4-933a8193-96b1-411f-9392-3e4bd2cda6f0' of framework 201103282247-0000000019-0000
> I0227 23:45:56.547610 38411 cgroups_isolation_module.cpp:571] Changing cgroup controls for executor thermos-1362008747374-wickman-seizure-4-933a8193-96b1-411f-9392-3e4bd2cda6f0 of framework 201103282247-00000000
> 19-0000 with resources cpus=0.35; mem=176; disk=512; ports=[31385-31385]
> I0227 23:45:56.547863 38406 slave.cpp:820] Flushing queued tasks for framework 201103282247-0000000019-0000
> I0227 23:45:56.548074 38411 cgroups_isolation_module.cpp:676] Updated 'cpu.shares' to 358 for executor thermos-1362008747374-wickman-seizure-4-933a8193-96b1-411f-9392-3e4bd2cda6f0 of framework 201103282247-00000
> 00019-0000
> I0227 23:45:56.548812 38411 cgroups_isolation_module.cpp:774] Updated 'memory.limit_in_bytes' to 184549376 for executor thermos-1362008747374-wickman-seizure-4-933a8193-96b1-411f-9392-3e4bd2cda6f0 of framework 2
> 01103282247-0000000019-0000
> libprotobuf ERROR google/protobuf/message_lite.cc:123] Can't parse message of type "mesos.internal.StatusUpdateMessage" because it is missing required fields: update.slave_id.value
> W0227 23:45:56.663353 38408 protobuf.hpp:252] Initialization errors: update.slave_id.value
> libprotobuf ERROR google/protobuf/message_lite.cc:123] Can't parse message of type "mesos.internal.StatusUpdateMessage" because it is missing required fields: update.slave_id.value
> W0227 23:45:56.673761 38400 protobuf.hpp:252] Initialization errors: update.slave_id.value

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira