You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Greg Mann (JIRA)" <ji...@apache.org> on 2018/02/08 02:36:00 UTC

[jira] [Commented] (MESOS-8469) Mesos master might drop some events in the operator API stream

    [ https://issues.apache.org/jira/browse/MESOS-8469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356398#comment-16356398 ] 

Greg Mann commented on MESOS-8469:
----------------------------------

{code}
commit e815417b235f9102f7740c55b700af6788bfcabb
Author: Greg Mann <gr...@mesosphere.io>
Date:   Wed Feb 7 14:51:02 2018 -0800

    Added test for delayed authorization during operator events.

    Until the fix for MESOS-8469, it was possible for the master
    operator event stream to drop events, if event-related state in
    the master changed in between asynchronous calls.

    This patch adds `MasterAPITest.EventAuthorizationDelayed` to
    verify the fix for that issue.

    Review: https://reviews.apache.org/r/65316/
{code}

> Mesos master might drop some events in the operator API stream
> --------------------------------------------------------------
>
>                 Key: MESOS-8469
>                 URL: https://issues.apache.org/jira/browse/MESOS-8469
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Vinod Kone
>            Assignee: Greg Mann
>            Priority: Critical
>             Fix For: 1.5.0
>
>
> Inside `Master::updateTask`, we call `Subscribers::send` which asynchronously calls `Subscribers::Subscriber::send` on each subscriber.
> But the problem is that inside `Subscribers:Subscriber::send` we are looking up the state of the master (e.g., getting Task* and Framework*) which might have changed between `Subscribers::send ` and `Subscribers::Subscriber::send`.
>  
> For example, if a terminal task received an acknowledgement the task might be removed from master's state, causing us to drop the TASK_UPDATED event.
>  
> We noticed this in an internal cluster, where a TASK_KILLED update was sent to one subscriber but not the other.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)