You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Gastón Kleiman (JIRA)" <ji...@apache.org> on 2019/02/13 22:46:00 UTC

[jira] [Assigned] (MESOS-9573) Agent should not try to recover operation status update streams that haven't been created yet.

     [ https://issues.apache.org/jira/browse/MESOS-9573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gastón Kleiman reassigned MESOS-9573:
-------------------------------------

    Assignee: Gastón Kleiman

https://reviews.apache.org/r/69977/diff/1#index_header

> Agent should not try to recover operation status update streams that haven't been created yet.
> ----------------------------------------------------------------------------------------------
>
>                 Key: MESOS-9573
>                 URL: https://issues.apache.org/jira/browse/MESOS-9573
>             Project: Mesos
>          Issue Type: Bug
>          Components: agent
>            Reporter: Gastón Kleiman
>            Assignee: Gastón Kleiman
>            Priority: Major
>              Labels: foundations, mesosphere
>
> If the agent fails over after having checkpointed a new operation but before the operation status update stream is created, the recovery process will fail.
> This happens because agent will try to recover the operation status update streams even if it hasn't been created yet.
> In order to prevent recovery failures, the agent should obtain the ids of the streams to recover by walking the directory in which operation status updates streams are stored.
> The agent should also garbage collect streams if the checkpointed state doesn't contain a corresponding operation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)