You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-dev@james.apache.org by ro...@apache.org on 2019/10/16 15:32:42 UTC

[james-project] branch master updated: JAMES-2813 Add Architecture Decision Record about Distributed Task Manager

This is an automated email from the ASF dual-hosted git repository.

rouazana pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/james-project.git


The following commit(s) were added to refs/heads/master by this push:
     new fe0af0b  JAMES-2813 Add Architecture Decision Record about Distributed Task Manager
fe0af0b is described below

commit fe0af0b23c6154dc4e1f69df89c6296e351794b7
Author: Gautier DI FOLCO <gd...@linagora.com>
AuthorDate: Wed Oct 2 11:57:33 2019 +0200

    JAMES-2813 Add Architecture Decision Record about Distributed Task Manager
---
 src/adr/0002-make-taskmanager-distributed.md       | 25 +++++++++++++++++++
 src/adr/0003-distributed-workqueue.md              | 26 +++++++++++++++++++
 src/adr/0004-distributed-tasks-listing.md          | 20 +++++++++++++++
 ...-distributed-task-termination-ackowledgement.md | 25 +++++++++++++++++++
 src/adr/0006-task-serialization.md                 | 29 ++++++++++++++++++++++
 src/adr/0007-distributed-task-cancellation.md      | 21 ++++++++++++++++
 src/adr/0008-distributed-task-await.md             | 22 ++++++++++++++++
 7 files changed, 168 insertions(+)

diff --git a/src/adr/0002-make-taskmanager-distributed.md b/src/adr/0002-make-taskmanager-distributed.md
new file mode 100644
index 0000000..a13c62d
--- /dev/null
+++ b/src/adr/0002-make-taskmanager-distributed.md
@@ -0,0 +1,25 @@
+# 2. Make TaskManager Distributed
+
+Date: 2019-10-02
+
+## Status
+
+Accepted (lazy consensus)
+
+## Context
+
+In order to have a distributed version of James we need to have an homogeneous way to deal with `Task`.
+
+Currently, every James nodes of a cluster have their own instance of `TaskManager` and they have no knowledge of others, making it impossible to orchestrate task execution at the cluster level.
+Tasks are scheduled and ran on the same node they are scheduled.
+
+We are also unable to list or access to the details of all the `Task`s of a cluster.
+
+## Decision
+
+Create a distribution-aware implementation of `TaskManager`.
+
+## Consequences
+
+ * Split the `TaskManager` part dealing with the coordination (`Task` management and view) and the `Task` execution (located in `TaskManagerWorker`)
+ * The distributed `TaskManager` will rely on RabbitMQ to coordinate and the event system to synchronize states
diff --git a/src/adr/0003-distributed-workqueue.md b/src/adr/0003-distributed-workqueue.md
new file mode 100644
index 0000000..832a93e
--- /dev/null
+++ b/src/adr/0003-distributed-workqueue.md
@@ -0,0 +1,26 @@
+# 3. Distributed WorkQueue
+
+Date: 2019-10-02
+
+## Status
+
+Accepted (lazy consensus)
+
+## Context
+
+By switching the task manager to a distributed implementation, we need to be able to run a `Task` on any node of the cluster.
+
+## Decision
+
+  For the time being we will keep the sequential execution property of the task manager.
+  This is an intermediate milestone toward the final implementation which will drop this property.
+
+ * Use a RabbitMQ queue as a workqueue where only the `Created` events are pushed into.
+   This queue will be exclusive and events will be consumed serially. Technically this means the queue will be consumed with a `prefetch = 1`.
+   The queue will listen to the worker on the same node and will ack the message only once it is finished (`Completed`, `Failed`, `Cancelled`).
+
+## Consequences
+
+ * It's a temporary and not safe to use in production solution: if the node promoted to exclusive listener of the queue dies, no more tasks will be run
+ * The serial execution of tasks does not leverage cluster scalability.
+
diff --git a/src/adr/0004-distributed-tasks-listing.md b/src/adr/0004-distributed-tasks-listing.md
new file mode 100644
index 0000000..a9dc089
--- /dev/null
+++ b/src/adr/0004-distributed-tasks-listing.md
@@ -0,0 +1,20 @@
+# 4. Distributed Tasks listing
+
+Date: 2019-10-02
+
+## Status
+
+Accepted (lazy consensus)
+
+## Context
+
+By switching the task manager to a distributed implementation, we need to be able to `list` all `Task`s running on the cluster.
+
+## Decision
+
+ * Read a Cassandra projection to get all `Task`s and their `Status`
+
+## Consequences
+
+ * A Cassandra projection has to be done
+ * The `EventSourcingSystem` should have a `Listener` updating the `Projection`
diff --git a/src/adr/0005-distributed-task-termination-ackowledgement.md b/src/adr/0005-distributed-task-termination-ackowledgement.md
new file mode 100644
index 0000000..f125b8a
--- /dev/null
+++ b/src/adr/0005-distributed-task-termination-ackowledgement.md
@@ -0,0 +1,25 @@
+# 5. Distributed Task termination ackowledgement
+
+Date: 2019-10-02
+
+## Status
+
+Accepted (lazy consensus)
+
+## Context
+
+By switching the task manager to a distributed implementation, we need to be able to execute a `Task` on any node of the cluster.
+We need a way for nodes to be signaled of any termination event so that we can notify blocking clients.
+
+## Decision
+
+ * Creating a `RabbitMQEventHandler` which publish `Event`s pushed to the task manager's event system to RabbitMQ
+ * All the events which end a `Task` (`Completed`, `Failed`, and `Canceled`) have to be transmitted to other nodes
+
+## Consequences
+
+ * A new kind of `Event`s should be created: `TerminationEvent` which includes `Completed`, `Failed`, and `Canceled`
+ * `TerminationEvent`s will be broadcasted on an exchange which will be bound to all interested components later
+ * `EventSourcingSystem.dipatch` should use `RabbitMQ` to dispatch `Event`s instead of triggering local `Listener`s
+ * Any node can be notified when a `Task` emits a termination event
+
diff --git a/src/adr/0006-task-serialization.md b/src/adr/0006-task-serialization.md
new file mode 100644
index 0000000..7530b0d
--- /dev/null
+++ b/src/adr/0006-task-serialization.md
@@ -0,0 +1,29 @@
+# 6. Task serialization
+
+Date: 2019-10-02
+
+## Status
+
+Accepted (lazy consensus)
+
+## Context
+
+By switching the task manager to a distributed implementation, we need to be able to execute a `Task` on any node of the cluster.
+We need to have a way to describe the `Task` to be executed and serialize it in order to be able to store it in the `Created` event. Which will be persisted in the Event Store, and will be send in the event bus.
+
+At this point in time a `Task` can contain any arbitrary code. It's not an element of a finite set of actions.
+
+## Decision
+
+ * Create a `Factory` for one `Task`
+ * Inject a `Factory` `Registry` via a Guice Module
+ * The `Task` `Serialization` will be done in JSON, We will get inspired by `EventSerializer`
+ * Every `Task`s should have a specific integration test demonstrating that serialization works
+ * Each `Task` is responsible of eventually dealing with the different versions of the serialized information
+
+
+## Consequences
+
+ * Every `Task`s should be serializable.
+ * Every `Task`s should provide a `Factory` which would be responsible to deserialize the task and instantiate it.
+ * Every `Factory` should be registered through a Guice module to be created for each project containing a `Factory`
diff --git a/src/adr/0007-distributed-task-cancellation.md b/src/adr/0007-distributed-task-cancellation.md
new file mode 100644
index 0000000..784dd5b
--- /dev/null
+++ b/src/adr/0007-distributed-task-cancellation.md
@@ -0,0 +1,21 @@
+# 7. Distributed Task cancellation
+
+Date: 2019-10-02
+
+## Status
+
+Accepted (lazy consensus)
+
+## Context
+
+A `Task` could be run on any node of the cluster. To interrupt it we need to notify all nodes of the cancel request.
+
+## Decision
+
+* We will add an EventHandler to broadcast the `CancelRequested` event to all the workers listening on a RabbitMQ broadcasting exchange.
+
+* The `TaskManager` should register to the exchange and will apply `cancel` on the `TaskManagerWorker` if the `Task` is waiting or in progress on it.
+
+## Consequences
+
+* The task manager's event system should be bound to the RabbitMQ exchange which publish the `TerminationEvent`s
diff --git a/src/adr/0008-distributed-task-await.md b/src/adr/0008-distributed-task-await.md
new file mode 100644
index 0000000..2f7e1c0
--- /dev/null
+++ b/src/adr/0008-distributed-task-await.md
@@ -0,0 +1,22 @@
+# 8. Distributed Task await
+
+Date: 2019-10-02
+
+## Status
+
+Accepted (lazy consensus)
+
+## Context
+
+By switching the task manager to a distributed implementation, we need to be able to `await` a `Task` running on any node of the cluster.
+
+## Decision
+
+ * Broadcast `Event`s in `RabbitMQ`
+
+## Consequences
+
+ * `RabbitMQTaskManager` should broadcast termination `Event`s (`Completed`|`Failed`|`Canceled`)
+ * `RabbitMQTaskManager.await` should: first, check the `Task`'s state; and if it's not terminated, listen to RabbitMQ
+ * The await should have a timeout limit
+


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org