You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Gastón Kleiman (JIRA)" <ji...@apache.org> on 2019/02/13 22:46:00 UTC
[jira] [Assigned] (MESOS-9573) Agent should not try to recover
operation status update streams that haven't been created yet.
[ https://issues.apache.org/jira/browse/MESOS-9573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gastón Kleiman reassigned MESOS-9573:
-------------------------------------
Assignee: Gastón Kleiman
https://reviews.apache.org/r/69977/diff/1#index_header
> Agent should not try to recover operation status update streams that haven't been created yet.
> ----------------------------------------------------------------------------------------------
>
> Key: MESOS-9573
> URL: https://issues.apache.org/jira/browse/MESOS-9573
> Project: Mesos
> Issue Type: Bug
> Components: agent
> Reporter: Gastón Kleiman
> Assignee: Gastón Kleiman
> Priority: Major
> Labels: foundations, mesosphere
>
> If the agent fails over after having checkpointed a new operation but before the operation status update stream is created, the recovery process will fail.
> This happens because agent will try to recover the operation status update streams even if it hasn't been created yet.
> In order to prevent recovery failures, the agent should obtain the ids of the streams to recover by walking the directory in which operation status updates streams are stored.
> The agent should also garbage collect streams if the checkpointed state doesn't contain a corresponding operation.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)