You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@seatunnel.apache.org by GitBox <gi...@apache.org> on 2022/07/27 17:24:27 UTC

[GitHub] [incubator-seatunnel] ic4y opened a new issue, #2279: [ST-Engine][Design] TaskExecutionService and Task related design

ic4y opened a new issue, #2279:
URL: https://github.com/apache/incubator-seatunnel/issues/2279

### Search before asking

- [X] I had searched in the [feature](https://github.com/apache/incubator-seatunnel/issues?q=is%3Aissue+label%3A%22Feature%22) and found no similar feature requirement.

### Description

TaskExecutionServer is a service that executes Tasks and will run an instance on each node. It receives the TaskGroup from the JobMaster and runs the Task in it. And maintain TaskID->TaskContext, and the specific operations on Task are encapsulated in TaskContext. And Task holds OperationService internally, which means that Task can remotely call and communicate with other Tasks or JobMaster through OperationService.

TaskGroup design :
The tasks in a TaskGroup all run on the same node.
<img width="432" alt="image" src="https://user-images.githubusercontent.com/83933160/181298022-3cd6a80a-3c14-427c-a19b-48cbc1e7c0d8.png">

An optimization point:
The data channel between tasks within the same TaskGroup uses a local Queue. And the data channel between different TaskGroups may use a distributed Queue (hazalcast Ringbuffer) because it may be executed on different nodes.

Task design:
One of the most important methods of Task is call(), and the executor drives the operation of Task by calling call() of Task. call() will have a return value of ProgressState, through which the executor can determine whether the Task has ended and whether it needs to continue to call call(). As follows.

Thread Share optimization:
Thread Share Background: In the scenario where a large number of small tasks are synchronized, a large number of tasks will be generated. If each Task is responsible for one thread, it will waste resources by running a large number of threads. At this time, if one thread can run multiple Tasks, this situation will be greatly improved. But how can one thread execute multiple tasks at the same time? Because the Task is internally driven by calling Call() again and again, a thread can call Call() of all Tasks it is responsible for in turn. As follows.

This will also bring a problem, that is, if the call() execution time of a task is very long. In this way, this thread will be used all the time, causing the delay of other tasks to be very serious.

For such a problem, I temporarily think of the following two optimization solutions:

Option1: Marking Thread Share

Provide an marking on the Task, and mark this Task to support Thread Share. In the specific implementation of the task, marking whether the task supports thread sharing. Tasks that can be shared will share a thread for execution, and tasks that cannot be shared will be executed exclusively by a thread.

Whether the Task supports thread sharing is evaluated by the specific implementer of the Task. According to the execution time of the Call method, if the execution implementation of the Call method is all at the ms level, then the Task can be marked as supporting thread sharing.

Option2: Dynamic Thread Share

There is a fundamental problem with the above solution one, that is, the execution time of the Call method is often not fixed, and the Task itself is not very clear about the calling time of its Call() method. Because different stages, different amounts of data, etc. will affect the execution time of Call(). It is not very appropriate for such a Task to be marked as supporting shared threads or not. Because if a thread is marked as a shareable thread, if the execution time of a call to the Call method is very long, this will cause the delay of other tasks that share the current thread to be very high. If sharing is not supported, the problem of resource waste is still not solved.

So the task thread sharing can be made dynamic, and a group of tasks is executed by a thread pool (the number of tasks >> the number of threads). During the execution of thread1, if the execution time of call() of Task1 exceeds the set value (100ms), a thread thread2 will be taken out from the thread pool to execute the Call method of the next Task2. It is guaranteed that the delay of other tasks will not be too high due to the long execution time of Task1. When the call method of Task2 is executed normally within the timeout period, it will put Task2 back at the end of the task queue, and thread2 will continue to take out Task3 from the task queue to execute the Call method. When the call method of Task1 is executed, thread1 will be put back into the thread pool, and Task1 will be marked as timed out once. When a certain task's Call method executes timeout times reaches a certain limit, the task will be removed from the shared thread task queue, and a thread will be used exclusi
vely.

The related execution process is as follows：

### Usage Scenario

_No response_

### Related issues

_No response_

### Are you willing to submit a PR?

- [X] Yes I am willing to submit a PR!

### Code of Conduct

- [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org