You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Martijn Visser (Jira)" <ji...@apache.org> on 2022/03/10 08:11:00 UTC

[jira] [Commented] (FLINK-26556) Refactoring MiniCluster and TestingMiniCluster

    [ https://issues.apache.org/jira/browse/FLINK-26556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504072#comment-17504072 ] 

Martijn Visser commented on FLINK-26556:
----------------------------------------

[~slinkydeveloper] Just pinging you because you might be interested in this. 

> Refactoring MiniCluster and TestingMiniCluster
> ----------------------------------------------
>
>                 Key: FLINK-26556
>                 URL: https://issues.apache.org/jira/browse/FLINK-26556
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination
>    Affects Versions: 1.16.0
>            Reporter: Niklas Semmler
>            Priority: Major
>
> After working with the MiniCluster & TestingMiniCluster on FLINK-25235, I saw some shortcomings that I patched up but did not fix. I would recommend that we change it for 1.16, because it is becoming more and more difficult to understand what is happening.
> *Background*
> - We recently changed the implementation of the leader election from a separate {{LeaderElectionDriver}} per JobManager component (e.g., dispatcher, rest server, etc.) to a combined {{LeaderElectionDriver}} for the JobManager as a whole. (For Zoo Keeper based leader election we previously used a {{ZooKeeperLeaderElectionDriver}}. Now we use {{MultipleComponentLeaderElectionDriverAdapter}} which de-multiplexes the leader election for all JobManager components via a single {{MultipleComponentLeaderElectionService}} to a single {{ZooKeeperMultipleComponentLeaderElectionDriver}} underneath.) This changed the mapping between {{HighAvailabilityServices}} and {{LeaderElectionDriver}} from a one-to-many to a one-to-one relationship.
> - {{HighAvailabilityServices}} is the class responsible for high availability (e.g., leader fail-over). However, even though JobManager and {{TaskManager}} have a dependecy on this class, not all scenarios require high availability. The implementation {{AbstractNonHaServices}} and its {{EmeddedHaServices}} (single JVM setup) and {{StandaloneHaServices}} (no support for JobManager failures) are used for these scenarios.
> - {{MiniCluster}} is the class responsible for managing a single-node Flink cluster. It contains a single {{HighAvailabilityServices}} object that is shared by the JobManager and multiple {{TaskManager}}.
> - {{TestingMiniCluster}} extends the {{MiniCluster}} for test purposes. Among others, it is used for tests on leader election between multiple JobManager.
> *Issues with MiniCluster*
> - The {{MiniCluster}} is meant to "to execute Flink jobs locally". To me this means that the {{MiniCluster}} _should_ only use {{EmbeddedHaServices}} as it does not need high availability on a single JVM. However, it does not put any constraints on the type of high availability services used.
> - Although the {{MiniCluster}} is production code. it contains code that is meant exclusively for testing. {{MiniCluster#getHaLeadershipControl}} is used to give a test explicit control over the leader election of an {{EmbeddedHaService}}. Btw, this method depends on {{HighAvailabilityServices}} being an {{EmbeddedHaServicesWithLeadershipControl}} object.
> - The code to create the {{HighAvailabilityServices}} object seems needlessly complex. In {{MiniCluster#createHighAvailabilityServices}} we differentiate between {{EmbeddedWithControlHighAvailabilityServices}} and everything else. In {{HighAvailabilityServicesUtils}} we distinguish between {{EmbeddedHaServices}}, {{ZooKeeperHaServices}} &{{ZooKeeperMultipleComponentLeaderElectionHaServices}}, and ones that are produced by {{HighAvailabilityServicesFactory}}. It would be nice, if we could flatten this into one method. Especially with the additional option to configure it via the {{TestingMiniCluster}} (see below). (I also don't understand why there is no Kubernetes option. Even though there is a {{KubernetesHaServicesFactory}}.) 
> *Issues with TestingMiniCluster*
> - The name TestingSomething is usually used for mock objects. In contrast, the {{TestingMiniCluster}} does not mock anything.
> - Its doc string says its used "to set a custom {@link HighAvailabilityServices}", but the {{MiniCluster}} already allows it.
> - The use of the {{TestingMiniCluster}} seems to be to configure the number of {{JobManagers}}, configure the {{HighAvailabilityServices}} (which looks redundant see below), and to configure the {{TaskManager}} to use only local communication even when more than one {{TaskManager}} exists. Why can this not be optional settings on the MiniCluster itself?
> - The {{TestingMiniCluster}} is used by tests that need multiple JobManager with separate leader election (e.g., ZooKeeperLeaderElectionITCase) and some that require that all parts including the {{TaskExecutor}} share the same {{HighAvailabilityServices}} (e.g., JobExecutionITCase).
> - {{TestingMiniCluster}} allows overriding the method {{MiniCluster#createHighAvailabilityServices}} for creating a {{HighAvailabilityServices}}. However, that method already has a two step process of creating the {{HighAvailabilityServices}}. The existing process even includes the option of using a custom factory. Again, this redundancy makes the code hard to understand.
> *Conclusion*
> - The {{MiniCluster}} and {{TestingMiniCluster}} cover a large variety of different use cases that are related to what implementation of {{HighAvaibilityServices}} is used. Splitting these classes into one per use case would greatly improve the readability.
> - The reason for using inheritance is not clear to me. First, the interaction between the subclass and parent class is hard to follow. Second, I don't see in what scenarios you would want to replace an instance of {{MiniCluster}} with {{TestingMiniCluster}}, so there should be no need to implement the same methods. (Could be wrong on this though.) Third, having a single non-abstract parent class with a single subclass on its own sounds like a degenerate inheritance tree.
> *Refactoring options*
> - We can strongly couple the {{MiniCluster}} with the type of {{HighlyAvailabilityServices}} and have one {{EmbeddedMiniCluster}} and one {{HighlyAvailableMiniCluster}}.
> - Alternatively, we can split the generation of the {{HighlyAvailabilityServices}} object into a separate factory (and hopefully make the generation process less deep). With some additional settings, the MiniCluster could then take over the role of the {{TestingMiniCluster}}.
> PS: To be exact, I should have used the term {{DispatcherResourceManagerComponent}} instead of JobManager, but this would've made the text even harder to read.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)