You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Steven Schlansker (JIRA)" <ji...@apache.org> on 2016/07/26 21:33:20 UTC

[jira] [Commented] (MESOS-5910) Operator SUBSCRIBE api should provide a method to get all events without requiring 100% uptime

    [ https://issues.apache.org/jira/browse/MESOS-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15394584#comment-15394584 ] 

Steven Schlansker commented on MESOS-5910:
------------------------------------------

It seems that it actually gives you a current snapshot when you initially subscribe, so perhaps this really is only an issue during master failovers.  So this is probably of somewhat lower importance than I thought, although correctly handling master failover without losing events is still desirable.

> Operator SUBSCRIBE api should provide a method to get all events without requiring 100% uptime
> ----------------------------------------------------------------------------------------------
>
>                 Key: MESOS-5910
>                 URL: https://issues.apache.org/jira/browse/MESOS-5910
>             Project: Mesos
>          Issue Type: Improvement
>          Components: HTTP API, json api
>    Affects Versions: 1.0.0
>            Reporter: Steven Schlansker
>
> The v1.0 Operator API adds a new SUBSCRIBE call, which returns a stream of events as they occur.  This is going to be extremely useful for monitoring and management jobs, as they can now have timely information about Mesos's operation without requiring repeated polling or other ugly solutions.
> Unfortunately, the SUBSCRIBE call always returns from the time the call is made.  This means that any consumer cannot reliably subscribe to "all events"; if the application goes offline (network blip, code upgrade, etc) all events during that downtime are lost.
> You could instead have a cluster of applications receiving the events and coordinating to deduplicate them to increase reliability, but this pushes a lot of complexity into clients, and I suspect most users would not do this correctly and would potentially lose events.
> It would be extremely useful for a single client to be able to get a reliable event stream without requiring a single HTTP connection to be 100% available.
> One possible solution is to assign every event an ID.  Then, extend the API to take a "start position" in the log.  The API immediately streams out all events from the start event up until the tail of the log, and then continues emitting new events are they occur.  This provides a reliable way for a consumer to get "at least once" semantics on events.  The caveat is that the consumer may only be down for as long as the master retains event history, but this is a much easier pill to swallow.  This is similar to etcd's "watch" api, if you are looking for an actual implementation to reference.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)