You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Steven Schlansker (JIRA)" <ji...@apache.org> on 2016/07/26 21:14:20 UTC

[jira] [Created] (MESOS-5910) Operator SUBSCRIBE api should provide a method to get all events without requiring 100% uptime

Steven Schlansker created MESOS-5910:
----------------------------------------

             Summary: Operator SUBSCRIBE api should provide a method to get all events without requiring 100% uptime
                 Key: MESOS-5910
                 URL: https://issues.apache.org/jira/browse/MESOS-5910
             Project: Mesos
          Issue Type: Improvement
          Components: HTTP API, json api
    Affects Versions: 1.0.0
            Reporter: Steven Schlansker


The v1.0 Operator API adds a new SUBSCRIBE call, which returns a stream of events as they occur.  This is going to be extremely useful for monitoring and management jobs, as they can now have timely information about Mesos's operation without requiring repeated polling or other ugly solutions.

Unfortunately, the SUBSCRIBE call always returns from the time the call is made.  This means that any consumer cannot reliably subscribe to "all events"; if the application goes offline (network blip, code upgrade, etc) all events during that downtime are lost.

You could instead have a cluster of applications receiving the events and coordinating to deduplicate them to increase reliability, but this pushes a lot of complexity into clients, and I suspect most users would not do this correctly and would potentially lose events.

It would be extremely useful for a single client to be able to get a reliable event stream without requiring a single HTTP connection to be 100% available.

One possible solution is to assign every event an ID.  Then, extend the API to take a "start position" in the log.  The API immediately streams out all events from the start event up until the tail of the log, and then continues emitting new events are they occur.  This provides a reliable way for a consumer to get "at least once" semantics on events.  The caveat is that the consumer may only be down for as long as the master retains event history, but this is a much easier pill to swallow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)