You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@aurora.apache.org by "brian wickman (JIRA)" <ji...@apache.org> on 2014/05/12 19:44:16 UTC

[jira] [Created] (AURORA-409) Executor exits with unacknowledged updates while the slave is down, resulting in LOST tasks.

brian wickman created AURORA-409:
------------------------------------

             Summary: Executor exits with unacknowledged updates while the slave is down, resulting in LOST tasks.
                 Key: AURORA-409
                 URL: https://issues.apache.org/jira/browse/AURORA-409
             Project: Aurora
          Issue Type: Bug
          Components: Executor
            Reporter: brian wickman


Originally filed by [~bmahler]

Currently, it appears as though Thermos will attempt to send status updates while the slave is down. This is correct, as the executor driver will re-send unacknowledged updates when the slave reconnects.

However, since Thermos does not wait for re-registered(), it's possible for Thermos to exit before the slave reconnects and the driver flushes unacknowledged updates.

To ensure updates are sent to the slave, Thermos must wait for reregistered() before exiting, if disconnected() was called. That is, in between disconnected() and re-registered(), Thermos must not send status updates and exit if reliable status updates are desired.



--
This message was sent by Atlassian JIRA
(v6.2#6252)