You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@dolphinscheduler.apache.org by C Yinan <yi...@live.com> on 2020/07/09 10:04:34 UTC
答复: Discuss on new feature for dependent node 延时等待方案讨论


The difference between Plan A and Plan B: There is only one type of waiting time in Plan A, which does not need to be set by the user and is fixed; While for Plan B, the user can set two types of waiting time, waiting for start and waiting for completion.



If my understanding is correct, I think that if we choose Plan A, the waiting time should also be set by users, because the execution time of different tasks may vary greatly.



On the other hand, it is not redundant to let users set the waiting time for startup and the waiting time for execution respectively in Plan B, because generally speaking, if it’s time for a node to execute, but it does not start for a while. It should be that there is a problem with the node, which can be judged quickly. But the running time of the node is likely to be much longer than the judgement. If the two waiting times are not distinguished, the case that the node not start running all the time, there is something wrong with the node itself obviously, but the expected running time of this node is very long, so that long the set waiting time, and the task needs a longer time to enter the failure process, even it is going wrong much earlier.

A方案和B方案的区别：A方案中只有一个等待时间，这个时间不需要用户设置，是固定的；而B方案用户可以分别设置等待启动和等待完成两种等待时间。

如果这样理解没错的话，我觉得如果采用A方案的话这个等待时间也应该由用户设置比较好吧，因为不同任务可能执行时间差别很大.

另一方面，我觉得B方案的让用户分别设置等待启动的时间和等待执行的时间可能并不是多余的设计，因为一般来说如果到了某个节点开始执行的时间而它迟迟没有开始那应该是这个节点出现了问题，这个可以很快完成判断；而节点运行的时间很可能远长于这个时间，如果不区分这两个等待时间的话可能出现节点一直没有开始运行，很显然是节点本身出了问题，但由于这个节点预计运行时间很长，所以设置的等待时间也很长，任务就需要很长时间才能进入失败流程。

发件人: wang<ma...@163.com>
发送时间: 2020年7月9日 13:24
收件人: dev@dolphinscheduler.apache.org<ma...@dolphinscheduler.apache.org>
主题: Discuss on new feature for dependent node 延时等待方案讨论

Dependency node new feature: Wait for the dependency to start
Current Features

When the current dependent node executes, if it is found that the dependent task node has not started to execute, the node will be directly judged as a failure.

New Feature Target

When the execution reaches the dependent node, if there is no ready in the list of dependent task nodes, the node will wait for a while. If the dependent task nodes are started within the waiting time, execution continues; otherwise, the node fails to execute.

Plan
Plan A

By using the existing timeout judgment mechanism, waiting for the dependent node to start and waiting for the dependent node to complete is regarded as an equivalent state, so as to achieve the goal. In the original mechanism, if the timeout attribute is not set, then a dependent task node will wait indefinitely until it is completed when the task node has been started; if the timeout attribute is set, then the waiting time is limited .

If this scheme is used, the new dependent node will exhibit the following behavior:

If the timeout attribute is not set:

If the dependent node finds that the dependent task node has not started, the dependent node will wait indefinitely for it to start (the original one failed directly);
The behavior of the dependent task node that has been started is the same as before.

If the timeout attribute is set:

If the dependent node finds that the dependent task node has not started, the dependent node will wait for it to start for a period of time, and if the wait times out, it will be treated the same as the dependent task running timeout;
The behavior of the dependent task node that has been started is the same as before.

Program advantages:

Simple implementation, less code changes, and even the front end can be left unchanged.

Disadvantages:

Cannot distinguish between two concepts of waiting to start and waiting to complete (question: is it important for users to distinguish these two concepts?)
Plan B

A special delay-waiting attribute is added for dependent nodes. Although this attribute is called "delay-waiting", it is closer to the "timeout setting" at the conceptual level.

Program advantages:

Simpler, less code changes;
Ability to set supermarket time separately for waiting for dependency start and waiting for completion.

Disadvantages:

The two attributes are similar in concept and appear redundant in design.
At last

For the above problem, the dependent task node is not started, is executing but timed out, and failed to run. I think they are all the reasons for the dependent node execution failure, and there is no obvious difference between them.

I personally prefer Plan A.

依赖节点新特性：延时等待
目前特点

当前依赖节点执行时，若发现被依赖任务节点尚未开始执行，该节点就会被直接判为失败。

新特性目标效果

当执行到依赖节点时，若被依赖的任务节点列表中存在没有就绪的，那么该节点会等待一段时间。若果等待时间内被依赖任务节点都启动了，就继续执行；否则该节点执行失败。

实现方案
方案A

利用已有的超时判断机制，将等待被依赖节点启动和等待被依赖节点完成视为等价的状态，从而达到目标。在原有的机制中，如果没有设置超时属性，那么一个被依赖任务节点在已经启动的情况下，依赖节点会无限期地等待直到它完成；如果设置了超时属性，那么这个等待时间就有了限制。

如果使用这个方案，那么新的依赖节点将会表现出以下行为：

如果没有设置超时属性：

如果依赖节点发现被依赖任务节点没有启动，依赖节点会无限期地等待它启动（原来的是直接失败）；
对于已经启动的被依赖任务节点，行为与以往相同。

如果设置了超时属性：

如果依赖节点发现被依赖任务节点没有启动，依赖节点会在一段时间内地等待它启动，如果等待超时了，将与被依赖任务运行超时同等对待；
对于已经启动的被依赖任务节点，行为与以往相同。

方案优点：

实现简单，代码改动少，前端甚至可以不改。

缺点：

前端同学可能会失业；
不能够区分等待启动和等待完成两个概念（疑问：对于用户来说区分这两个概念重要吗？）
方案B

为依赖节点专门新增一个延时等待的属性，这个属性虽然叫做“延时等待”，但它在概念层面上与“超时设置”更加贴近。

方案优点：

较为简单，代码改动少；
能够单独为等待依赖启动和等待依赖完成设置超市时间。

缺点：

两个属性概念上雷同，显得设计冗余。
最后

对于上面的问题，我认为被依赖任务节点未启动、正在执行但超时、运行失败都是依赖节点执行失败的原因，它们并没有明显的区别。

我个人倾向于方案A。