You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Stefan Richter (JIRA)" <ji...@apache.org> on 2019/01/28 10:31:00 UTC

[jira] [Updated] (FLINK-10043) Refactor object construction/inititlization/restore code

     [ https://issues.apache.org/jira/browse/FLINK-10043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stefan Richter updated FLINK-10043:
-----------------------------------
    Description: 
Currently, the constructor of {{RocksDBKeyedStateBackend}} has the following shortcomings:
- It does creation and cleanup of some directories and files. This makes it harder to unit-test because dependencies are created in the constructor and not passed in from outside.
- It leaves many important fields uninitialized and more methods e.g. {{restore}} _have_ to be called before the backend object is fully constructed. This is error-prone in many ways and hard to unit-test. I think the origin of this problem was introducing incremental snapshots, because in this case, we can only open a RocksDB instance AFTER the restore code was executed and restored the working directory.

As a solution, I would suggest to have a dedicated builder class that takes the current constructor parameters and (optional) the state handles to restore. Then, this class constructs and intializes all required objects, and those objects are only passed to the new {{RocksDBKeyedStateBackend}} constructor that does no other work besided assigning dependencies to fields.

With this change, I would also extract the different restore strategies for incremental and full snapshots out of the backend's main class, into their own classes. They will then be used in the newly introduced builder from the previous step. This builder would receive all objects that currently go into the constructor and the restore method. It should create all directories, and (if applicable) download state, create and restore a RocksDB instance object, create and register states. Everything concerning the construction of collaboratores for the backend should go into the builder and the backend main class should can simply receive all collaboratores and assign them to final fields.

One detail to concider for the builder is that all resources for collaboratores should be created and initialized in a resource-acquisition-is-initialization (RAII) style, in particular because some of them are backed by native (JNI) objects: If we fail to create a resource during the process, all previously created resources should properly be released and de-allocated. Releasing/De-allocation should happen in the excact inverse order of creation, to avoid any transitive double-frees in the native code. Only when all resources are created, the builder will create the main backend object, so again that the constuctor does not have to deal with any fault handling or cleanup-logic.  

  was:
Currently, the constructor of {{RocksDBKeyedStateBackend}} has the following shortcomings:
- It does initialization and cleanup of some directories and files. this makes it harder to unit-test because dependencies are created in the constructor and not passed in from outside.
- It leaves many important fields uninitialized and more methods e.g. {{restore}} _have_ to be called before the object is fully constructed. This is error-prone in many ways and hard to unit-test. I think the origin of this problem was introducing incremental snapshots, because in this case, we can only open a RocksDB instance AFTER the restore code was executed and restored the working directory.

As a solution, I would suggest to have a dedicated builder class that takes the current constructor parameters and (optional) the state handles to restore. Then, this class constructs and intializes all dependencies, and dependencies are only passed to the new {{RocksDBKeyedStateBackend}} constructor that does no other work besided assigning dependencies to fields.

With this change, I would also extract the different restore strategies for incremental and full snapshots out of the main class, into their own classes. They will then be used in the newly introduced builder from the previous step.


> Refactor object construction/inititlization/restore code
> --------------------------------------------------------
>
>                 Key: FLINK-10043
>                 URL: https://issues.apache.org/jira/browse/FLINK-10043
>             Project: Flink
>          Issue Type: Sub-task
>            Reporter: Stefan Richter
>            Assignee: Stefan Richter
>            Priority: Major
>
> Currently, the constructor of {{RocksDBKeyedStateBackend}} has the following shortcomings:
> - It does creation and cleanup of some directories and files. This makes it harder to unit-test because dependencies are created in the constructor and not passed in from outside.
> - It leaves many important fields uninitialized and more methods e.g. {{restore}} _have_ to be called before the backend object is fully constructed. This is error-prone in many ways and hard to unit-test. I think the origin of this problem was introducing incremental snapshots, because in this case, we can only open a RocksDB instance AFTER the restore code was executed and restored the working directory.
> As a solution, I would suggest to have a dedicated builder class that takes the current constructor parameters and (optional) the state handles to restore. Then, this class constructs and intializes all required objects, and those objects are only passed to the new {{RocksDBKeyedStateBackend}} constructor that does no other work besided assigning dependencies to fields.
> With this change, I would also extract the different restore strategies for incremental and full snapshots out of the backend's main class, into their own classes. They will then be used in the newly introduced builder from the previous step. This builder would receive all objects that currently go into the constructor and the restore method. It should create all directories, and (if applicable) download state, create and restore a RocksDB instance object, create and register states. Everything concerning the construction of collaboratores for the backend should go into the builder and the backend main class should can simply receive all collaboratores and assign them to final fields.
> One detail to concider for the builder is that all resources for collaboratores should be created and initialized in a resource-acquisition-is-initialization (RAII) style, in particular because some of them are backed by native (JNI) objects: If we fail to create a resource during the process, all previously created resources should properly be released and de-allocated. Releasing/De-allocation should happen in the excact inverse order of creation, to avoid any transitive double-frees in the native code. Only when all resources are created, the builder will create the main backend object, so again that the constuctor does not have to deal with any fault handling or cleanup-logic.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)