You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Vinod Kone (JIRA)" <ji...@apache.org> on 2013/03/14 01:44:13 UTC
[jira] [Resolved] (MESOS-367) Invalid StatusUpdateMessage from
missing slave id.
[ https://issues.apache.org/jira/browse/MESOS-367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vinod Kone resolved MESOS-367.
------------------------------
Resolution: Fixed
> Invalid StatusUpdateMessage from missing slave id.
> --------------------------------------------------
>
> Key: MESOS-367
> URL: https://issues.apache.org/jira/browse/MESOS-367
> Project: Mesos
> Issue Type: Bug
> Reporter: Benjamin Mahler
> Assignee: Vinod Kone
> Priority: Critical
>
> It looks like the ExecutorProcess sets its internal slaveId upon registration:
> void registered(const ExecutorInfo& executorInfo,
> const FrameworkID& frameworkId,
> const FrameworkInfo& frameworkInfo,
> const SlaveID& slaveId,
> const SlaveInfo& slaveInfo)
> {
> if (aborted) {
> VLOG(1) << "Ignoring registered message from slave " << slaveId
> << " because the driver is aborted!";
> return;
> }
> VLOG(1) << "Executor registered on slave " << slaveId;
> **** this->slaveId = slaveId; ***
> executor->registered(driver, executorInfo, frameworkInfo, slaveInfo);
> }
> A result of this is that if the registration is delayed, the executor can come up and send a status update (before the slaveId is set), resulting in an incomplete protobuf:
> void sendStatusUpdate(const TaskStatus& status)
> {
> VLOG(1) << "Executor sending status update for task "
> << status.task_id() << " in state " << status.state();
> if (status.state() == TASK_STAGING) {
> VLOG(1) << "Executor is not allowed to send "
> << "TASK_STAGING status updates. Aborting!";
> driver->abort();
> executor->error(driver, "Attempted to send TASK_STAGING status update");
> return;
> }
> StatusUpdateMessage message;
> StatusUpdate* update = message.mutable_update();
> update->mutable_framework_id()->MergeFrom(frameworkId);
> update->mutable_executor_id()->MergeFrom(executorId);
> **** update->mutable_slave_id()->MergeFrom(slaveId); ****
> update->mutable_status()->MergeFrom(status);
> update->set_timestamp(Clock::now());
> update->set_uuid(UUID::random().toBytes());
> send(slave, message);
> }
> The ExecutorProcess should take the slaveId in its constructor to avoid this issue.
> Here are the relevant log lines:
> I0227 23:45:56.547392 38406 slave.cpp:762] Got registration for executor 'thermos-1362008747374-wickman-seizure-4-933a8193-96b1-411f-9392-3e4bd2cda6f0' of framework 201103282247-0000000019-0000
> I0227 23:45:56.547610 38411 cgroups_isolation_module.cpp:571] Changing cgroup controls for executor thermos-1362008747374-wickman-seizure-4-933a8193-96b1-411f-9392-3e4bd2cda6f0 of framework 201103282247-00000000
> 19-0000 with resources cpus=0.35; mem=176; disk=512; ports=[31385-31385]
> I0227 23:45:56.547863 38406 slave.cpp:820] Flushing queued tasks for framework 201103282247-0000000019-0000
> I0227 23:45:56.548074 38411 cgroups_isolation_module.cpp:676] Updated 'cpu.shares' to 358 for executor thermos-1362008747374-wickman-seizure-4-933a8193-96b1-411f-9392-3e4bd2cda6f0 of framework 201103282247-00000
> 00019-0000
> I0227 23:45:56.548812 38411 cgroups_isolation_module.cpp:774] Updated 'memory.limit_in_bytes' to 184549376 for executor thermos-1362008747374-wickman-seizure-4-933a8193-96b1-411f-9392-3e4bd2cda6f0 of framework 2
> 01103282247-0000000019-0000
> libprotobuf ERROR google/protobuf/message_lite.cc:123] Can't parse message of type "mesos.internal.StatusUpdateMessage" because it is missing required fields: update.slave_id.value
> W0227 23:45:56.663353 38408 protobuf.hpp:252] Initialization errors: update.slave_id.value
> libprotobuf ERROR google/protobuf/message_lite.cc:123] Can't parse message of type "mesos.internal.StatusUpdateMessage" because it is missing required fields: update.slave_id.value
> W0227 23:45:56.673761 38400 protobuf.hpp:252] Initialization errors: update.slave_id.value
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira