You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@aurora.apache.org by "brian wickman (JIRA)" <ji...@apache.org> on 2014/05/12 19:44:16 UTC
[jira] [Created] (AURORA-409) Executor exits with unacknowledged
updates while the slave is down, resulting in LOST tasks.
brian wickman created AURORA-409:
------------------------------------
Summary: Executor exits with unacknowledged updates while the slave is down, resulting in LOST tasks.
Key: AURORA-409
URL: https://issues.apache.org/jira/browse/AURORA-409
Project: Aurora
Issue Type: Bug
Components: Executor
Reporter: brian wickman
Originally filed by [~bmahler]
Currently, it appears as though Thermos will attempt to send status updates while the slave is down. This is correct, as the executor driver will re-send unacknowledged updates when the slave reconnects.
However, since Thermos does not wait for re-registered(), it's possible for Thermos to exit before the slave reconnects and the driver flushes unacknowledged updates.
To ensure updates are sent to the slave, Thermos must wait for reregistered() before exiting, if disconnected() was called. That is, in between disconnected() and re-registered(), Thermos must not send status updates and exit if reliable status updates are desired.
--
This message was sent by Atlassian JIRA
(v6.2#6252)