You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Till Rohrmann (JIRA)" <ji...@apache.org> on 2019/04/10 08:58:00 UTC
[jira] [Commented] (FLINK-10333) Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, CompletedCheckpoints)

    [ https://issues.apache.org/jira/browse/FLINK-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16814210#comment-16814210 ] 

Till Rohrmann commented on FLINK-10333:
---------------------------------------

I like the idea of using transactions to ensure that only the leader can modify znodes below a certain znode (job_id, cluster_id, etc.). That way we could get rid of the clumsy locking mechanism.

For the part how to generate the session id from the {{eleection-node-path}} I need a bit more details. Maybe they are contained in the design document. So it would be great if you could link it here [~Tison].

For the implementation I would be strongly in favor of adding a new {{ZooKeeperNGHighAvailabilityServices}} implementation so that we can always switch back to the old implementation. 

One problem might be that these two {{HighAvailabilityServices}} would not be compatible. So users might not be able to switch the ha services if they want to recover ha state. But this limitation might be ok in order to avoid complicated backwards compatibility logic.

> Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, CompletedCheckpoints)
> -------------------------------------------------------------------------------------
>
>                 Key: FLINK-10333
>                 URL: https://issues.apache.org/jira/browse/FLINK-10333
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.5.3, 1.6.0, 1.7.0
>            Reporter: Till Rohrmann
>            Priority: Major
>
> While going over the ZooKeeper based stores ({{ZooKeeperSubmittedJobGraphStore}}, {{ZooKeeperMesosWorkerStore}}, {{ZooKeeperCompletedCheckpointStore}}) and the underlying {{ZooKeeperStateHandleStore}} I noticed several inconsistencies which were introduced with past incremental changes.
> * Depending whether {{ZooKeeperStateHandleStore#getAllSortedByNameAndLock}} or {{ZooKeeperStateHandleStore#getAllAndLock}} is called, deserialization problems will either lead to removing the Znode or not
> * {{ZooKeeperStateHandleStore}} leaves inconsistent state in case of exceptions (e.g. {{#getAllAndLock}} won't release the acquired locks in case of a failure)
> * {{ZooKeeperStateHandleStore}} has too many responsibilities. It would be better to move {{RetrievableStateStorageHelper}} out of it for a better separation of concerns
> * {{ZooKeeperSubmittedJobGraphStore}} overwrites a stored {{JobGraph}} even if it is locked. This should not happen since it could leave another system in an inconsistent state (imagine a changed {{JobGraph}} which restores from an old checkpoint)
> * Redundant but also somewhat inconsistent put logic in the different stores
> * Shadowing of ZooKeeper specific exceptions in {{ZooKeeperStateHandleStore}} which were expected to be caught in {{ZooKeeperSubmittedJobGraphStore}}
> * Getting rid of the {{SubmittedJobGraphListener}} would be helpful
> These problems made me think how reliable these components actually work. Since these components are very important, I propose to refactor them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)