You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@dolphinscheduler.apache.org by wang <ww...@163.com> on 2020/07/29 10:04:08 UTC

[DISCUSS][SUMMER-2020] Time setting of the task instance

Discussion: Time setting

I wrote a function to delay the execution of the task, but in my discussion with my mentor, there was a disagreement on how to assign the start time in the task instance.

Problem Description

In the original design, the start time of the task instance will be initialized or changed in several places:

In the createTaskInstance() method in dolphinscheduler-server/.../server/master/runner/MasterExecThread.java
dolphinscheduler-server/.../server/master/runner/SubProcessTaskExecThread.javawill update the start time once the task is submitted
If a failed task tries to retry, it will be updated in the submitTaskInstanecToDB() method in dolphinscheduler-service/.../service/process/ProcessService.java
After the worker is dispatched, an ACK is sent in the process() method in dolphinscheduler-server/.../server/worker/processor/TaskExecuteProcessor.javato update the start time

Most tasks (such as tasks that are dispatched to workers) have two start times from the creation of an instance to the completion of execution. Let us briefly analyze what problems will arise in this way. For example, there is a function to view the running time of a task that uses this start time, and then a task is created at 10:00, and a worker receives the task at 10:05. According to the original design, I can see that the task has been running for 4 minutes at 10:04, but I will see that the task has been running for 1 minute at 10:06 (because the worker will send an ACK to update the start time at 10:05), which is obviously unreasonable.

Let me first define a few key time points (condition nodes and dependent nodes are considered separately):

Time when the master created the task instance
Worker receiving task instance time
Worker delay execution start time
Time when the worker starts to execute the task
Worker execution task completion time

Then define the time between each time point:

1->2: Waiting time in the queue (including submission to the database, removal from the database, submission to the task queue, and removal from the task queue)
3->4: Delay execution time
4->5: Task execution time
My mentor's point of view

He defines startTime in TaskInstance as time point 4. Before this time point, this value should not be null, and thinks that it is better to initialize to new Date(). At the same time, he believes that the original design should be maintained, and the value can be updated multiple times if necessary.

My point of view

I also defined the startTime in TaskInstance as time point 4, but I think this value should remain nullbefore this time point, and this value is assigned only once in the life cycle of a task instance. Will no longer be updated afterwards.

My reasons are as follows:

A variable should not be assigned before it meets its definition;
Multiple updates will lead to inconsistencies in reading data multiple times, such as the example of checking the running time of the task mentioned earlier;
There is a member submitTime in TaskInstance. In the original design, its value is very close to startTime. When the system load is small, they are almost always the same. So where you need to use before startTime initialization, you can use submitTime instead;
However, if the startTime before initialization needs to be used somewhere in the code, then whether this time should be called startTime is open to question;
Use IDEA for code analysis, startTime is rarely used.

In addition, I found that in the code for viewing the Gantt chart, there is a line of code that is written like this:

Date startTime = taskInstance.getStartTime() == null? New Date(): taskInstance.getStartTime();

It is considered here that startTime can be null, which supports my point from the side.

The above questions can be summed up as follows: The original design of the system is ambiguous in the definition of the task start time.

Impact of improvement

The code of startTime that has not been initialized may be used. I found two places:

checkTaskAfterWorkerStart() in dolphinscheduler-server/../server/zk/ZKMasterClient.java
getRemaintime() in dolphinscheduler-server/.../server/master/runner/MasterTaskExecThread.java

For the first part, since I am not familiar with this part of the code, I don't know whether adding a judgment can complete the original logic. But for the second part, since the worker may crash at any time between time 2 and 5, if the startTime is initialized without sending an ACK before the crash, the master will not be able to determine whether it has timed out and take corresponding actions. I do not have a good solution to this problem for the time being, and welcome your comments.

At last

For the startTime of taskInstance, how do you think it is better to initialize? Do I need to make improvements based on my point of view? Or whether to keep the original design? Or have other suggestions?

PS:

The same problem exists for the endTime of taskInstance. After the worker finishes executing the task, the sendResult will update this time, but the master will also update this time after waitTaskQuit.

讨论:时间的设置

我编写了一个使任务延迟执行的功能,但是在我与我的导师的讨论中,就任务实例(taskInstance)中的开始时间(startTime)的如何赋值发生了分歧。

问题描述

在原先的设计中,任务实例的开始时间会在多处得到初始化或者变更:

dolphinscheduler-server/.../server/master/runner/MasterExecThread.java中的createTaskInstance()方法中
dolphinscheduler-server/.../server/master/runner/SubProcessTaskExecThread.java会在提交任务后更新一次开始时间
如果是失败任务尝试重试,会在dolphinscheduler-service/.../service/process/ProcessService.java中的submitTaskInstanecToDB()方法中得到更新
worker被派发任务后,dolphinscheduler-server/.../server/worker/processor/TaskExecuteProcessor.java中process()方法中发送了一个ACK去更新开始时间

大多数任务(例如被派发到worker的任务)从创建实例到执行完成,会有两个开始时间。让我们简单分析一下这样会出现什么问题。比如现在有一个查看任务已运行时间的功能使用到了这个开始时间,然后有一个任务在10:00创建,在10:05时有一个worker接收了这个任务。按照原先的设计,我在10:04时能够查看到任务已运行时间为4分钟,但是我将在10:06时查看到任务已运行时间变短为1分钟(因为10:05时worker会发送一个ACK去更新开始时间),这显然是不合理的。

下面我先定义几个关键的时间点(条件节点和依赖节点单独考虑):

master创建任务实例时间
worker接收任务实例时间
worker延迟执行开始时间
worker开始执行任务时间
worker执行任务完成时间

然后就每个时间点之间的时间作出定义:

1->2:在队列中的等待时间(包括提交到数据库、从数据库中取出、提交到任务队列、从任务队列中取出)
3->4:延迟执行时间
4->5:任务执行时间
我的导师的观点

他将TaskInstance中的startTime定义为时间点4,在该时间点之前这个值不应该为null,并且认为初始化为new Date()比较好。同时他认为应保持原先的设计,必要时可以对该值进行多次更新。

我的观点

我也将TaskInstance中的startTime定义为时间点4,但是我认为在该时间点之前这个值应该保持为null,并且在一个任务实例的生命周期中,这个值有且只有一次被赋值,赋值后不再被更新。

我的理由有以下几点:

一个变量尚未满足它的定义之前不应该被赋值;
多次更新将导致多次读取数据不一致,例如前面提到的查看任务已运行时间的例子;
TaskInstance中有一个成员submitTime,在原先的设计中它的值跟startTime非常接近,在系统负载较小的时候,它们俩几乎总是保持一致。所以在startTime初始化之前需要用到的地方可以使用submitTime代替;
但是,如果代码某处需要用到初始化之前的startTime,那么这个时间是否应该叫做startTime就有待商榷了;
使用IDEA进行代码分析,startTime很少被用到。

此外,我发现在查看甘特图的代码中,有一行代码是这么写的:

Date startTime = taskInstance.getStartTime() == null ? new Date() : taskInstance.getStartTime();

在这里就考虑到了startTime是可以为null的,这从侧面佐证了我的观点。

上面的问题总结起来就是:系统原先的设计在任务开始时间的定义上就是模棱两可的。

改进带来的影响

可能会用到尚未初始化的startTime的代码我发现两处:

dolphinscheduler-server/../server/zk/ZKMasterClient.java中的checkTaskAfterWorkerStart()
dolphinscheduler-server/.../server/master/runner/MasterTaskExecThread.java中的getRemaintime()

对于第一处,由于我还不熟悉这部分代码,不知道增加一个判断能否完成原有的逻辑。但是对于第二处,由于worker可能会在时间点2到5之间的任意时刻奔溃,如果奔溃之前没有发送ACK将startTime初始化,master将无法判断是否超时并作出相应的动作。对于这个问题我暂时没有好的解决方案,欢迎大家提意见。

最后

对于taskInstance的startTime,大家觉得如何初始化比较好?是否需要按照我的观点进行改进?或者是否保持原有的设计?或者有其他的建议?

PS:

对于taskInstance的endTime也有同样的问题,worker在执行完任务之后,会sendResult更新这个时间,但master在waitTaskQuit之后也会去更新一次这个时间。

Re:Re: [DISCUSS][SUMMER-2020] Time setting of the task instance

Posted by wang <ww...@163.com>.
Thank you for your reply and support, I will try my best to improve the code and keep the original functions unchanged. In addition, I submitted the same improved ISSUE in Github. You and others are welcome to discuss and make suggestions there.


https://github.com/apache/incubator-dolphinscheduler/issues/3352





在 2020-07-30 09:31:44,"wu shaoj" <ga...@apache.org> 写道:
>A variable should not be assigned before it meets its definition, which should be a development principle.
>Take your point. The start time should be assigned just once. Status should be added to a task as an attribute which indicate current status, such as PREPARE/ACKED/RUNNING/STOPED
>
>
>On 2020/7/29, 18:24, "wang" <ww...@163.com> wrote:
>
>    Discussion: Time setting
>
>    I wrote a function to delay the execution of the task, but in my discussion with my mentor, there was a disagreement on how to assign the start time in the task instance.
>
>    Problem Description
>
>    In the original design, the start time of the task instance will be initialized or changed in several places:
>
>    In the createTaskInstance() method in dolphinscheduler-server/.../server/master/runner/MasterExecThread.java
>    dolphinscheduler-server/.../server/master/runner/SubProcessTaskExecThread.javawill update the start time once the task is submitted
>    If a failed task tries to retry, it will be updated in the submitTaskInstanecToDB() method in dolphinscheduler-service/.../service/process/ProcessService.java
>    After the worker is dispatched, an ACK is sent in the process() method in dolphinscheduler-server/.../server/worker/processor/TaskExecuteProcessor.javato update the start time
>
>    Most tasks (such as tasks that are dispatched to workers) have two start times from the creation of an instance to the completion of execution. Let us briefly analyze what problems will arise in this way. For example, there is a function to view the running time of a task that uses this start time, and then a task is created at 10:00, and a worker receives the task at 10:05. According to the original design, I can see that the task has been running for 4 minutes at 10:04, but I will see that the task has been running for 1 minute at 10:06 (because the worker will send an ACK to update the start time at 10:05), which is obviously unreasonable.
>
>    Let me first define a few key time points (condition nodes and dependent nodes are considered separately):
>
>    Time when the master created the task instance
>    Worker receiving task instance time
>    Worker delay execution start time
>    Time when the worker starts to execute the task
>    Worker execution task completion time
>
>    Then define the time between each time point:
>
>    1->2: Waiting time in the queue (including submission to the database, removal from the database, submission to the task queue, and removal from the task queue)
>    3->4: Delay execution time
>    4->5: Task execution time
>    My mentor's point of view
>
>    He defines startTime in TaskInstance as time point 4. Before this time point, this value should not be null, and thinks that it is better to initialize to new Date(). At the same time, he believes that the original design should be maintained, and the value can be updated multiple times if necessary.
>
>    My point of view
>
>    I also defined the startTime in TaskInstance as time point 4, but I think this value should remain nullbefore this time point, and this value is assigned only once in the life cycle of a task instance. Will no longer be updated afterwards.
>
>    My reasons are as follows:
>
>    A variable should not be assigned before it meets its definition;
>    Multiple updates will lead to inconsistencies in reading data multiple times, such as the example of checking the running time of the task mentioned earlier;
>    There is a member submitTime in TaskInstance. In the original design, its value is very close to startTime. When the system load is small, they are almost always the same. So where you need to use before startTime initialization, you can use submitTime instead;
>    However, if the startTime before initialization needs to be used somewhere in the code, then whether this time should be called startTime is open to question;
>    Use IDEA for code analysis, startTime is rarely used.
>
>    In addition, I found that in the code for viewing the Gantt chart, there is a line of code that is written like this:
>
>    Date startTime = taskInstance.getStartTime() == null? New Date(): taskInstance.getStartTime();
>
>    It is considered here that startTime can be null, which supports my point from the side.
>
>    The above questions can be summed up as follows: The original design of the system is ambiguous in the definition of the task start time.
>
>    Impact of improvement
>
>    The code of startTime that has not been initialized may be used. I found two places:
>
>    checkTaskAfterWorkerStart() in dolphinscheduler-server/../server/zk/ZKMasterClient.java
>    getRemaintime() in dolphinscheduler-server/.../server/master/runner/MasterTaskExecThread.java
>
>    For the first part, since I am not familiar with this part of the code, I don't know whether adding a judgment can complete the original logic. But for the second part, since the worker may crash at any time between time 2 and 5, if the startTime is initialized without sending an ACK before the crash, the master will not be able to determine whether it has timed out and take corresponding actions. I do not have a good solution to this problem for the time being, and welcome your comments.
>
>    At last
>
>    For the startTime of taskInstance, how do you think it is better to initialize? Do I need to make improvements based on my point of view? Or whether to keep the original design? Or have other suggestions?
>
>    PS:
>
>    The same problem exists for the endTime of taskInstance. After the worker finishes executing the task, the sendResult will update this time, but the master will also update this time after waitTaskQuit.
>
>    讨论:时间的设置
>
>    我编写了一个使任务延迟执行的功能,但是在我与我的导师的讨论中,就任务实例(taskInstance)中的开始时间(startTime)的如何赋值发生了分歧。
>
>    问题描述
>
>    在原先的设计中,任务实例的开始时间会在多处得到初始化或者变更:
>
>    dolphinscheduler-server/.../server/master/runner/MasterExecThread.java中的createTaskInstance()方法中
>    dolphinscheduler-server/.../server/master/runner/SubProcessTaskExecThread.java会在提交任务后更新一次开始时间
>    如果是失败任务尝试重试,会在dolphinscheduler-service/.../service/process/ProcessService.java中的submitTaskInstanecToDB()方法中得到更新
>    worker被派发任务后,dolphinscheduler-server/.../server/worker/processor/TaskExecuteProcessor.java中process()方法中发送了一个ACK去更新开始时间
>
>    大多数任务(例如被派发到worker的任务)从创建实例到执行完成,会有两个开始时间。让我们简单分析一下这样会出现什么问题。比如现在有一个查看任务已运行时间的功能使用到了这个开始时间,然后有一个任务在10:00创建,在10:05时有一个worker接收了这个任务。按照原先的设计,我在10:04时能够查看到任务已运行时间为4分钟,但是我将在10:06时查看到任务已运行时间变短为1分钟(因为10:05时worker会发送一个ACK去更新开始时间),这显然是不合理的。
>
>    下面我先定义几个关键的时间点(条件节点和依赖节点单独考虑):
>
>    master创建任务实例时间
>    worker接收任务实例时间
>    worker延迟执行开始时间
>    worker开始执行任务时间
>    worker执行任务完成时间
>
>    然后就每个时间点之间的时间作出定义:
>
>    1->2:在队列中的等待时间(包括提交到数据库、从数据库中取出、提交到任务队列、从任务队列中取出)
>    3->4:延迟执行时间
>    4->5:任务执行时间
>    我的导师的观点
>
>    他将TaskInstance中的startTime定义为时间点4,在该时间点之前这个值不应该为null,并且认为初始化为new Date()比较好。同时他认为应保持原先的设计,必要时可以对该值进行多次更新。
>
>    我的观点
>
>    我也将TaskInstance中的startTime定义为时间点4,但是我认为在该时间点之前这个值应该保持为null,并且在一个任务实例的生命周期中,这个值有且只有一次被赋值,赋值后不再被更新。
>
>    我的理由有以下几点:
>
>    一个变量尚未满足它的定义之前不应该被赋值;
>    多次更新将导致多次读取数据不一致,例如前面提到的查看任务已运行时间的例子;
>    TaskInstance中有一个成员submitTime,在原先的设计中它的值跟startTime非常接近,在系统负载较小的时候,它们俩几乎总是保持一致。所以在startTime初始化之前需要用到的地方可以使用submitTime代替;
>    但是,如果代码某处需要用到初始化之前的startTime,那么这个时间是否应该叫做startTime就有待商榷了;
>    使用IDEA进行代码分析,startTime很少被用到。
>
>    此外,我发现在查看甘特图的代码中,有一行代码是这么写的:
>
>    Date startTime = taskInstance.getStartTime() == null ? new Date() : taskInstance.getStartTime();
>
>    在这里就考虑到了startTime是可以为null的,这从侧面佐证了我的观点。
>
>    上面的问题总结起来就是:系统原先的设计在任务开始时间的定义上就是模棱两可的。
>
>    改进带来的影响
>
>    可能会用到尚未初始化的startTime的代码我发现两处:
>
>    dolphinscheduler-server/../server/zk/ZKMasterClient.java中的checkTaskAfterWorkerStart()
>    dolphinscheduler-server/.../server/master/runner/MasterTaskExecThread.java中的getRemaintime()
>
>    对于第一处,由于我还不熟悉这部分代码,不知道增加一个判断能否完成原有的逻辑。但是对于第二处,由于worker可能会在时间点2到5之间的任意时刻奔溃,如果奔溃之前没有发送ACK将startTime初始化,master将无法判断是否超时并作出相应的动作。对于这个问题我暂时没有好的解决方案,欢迎大家提意见。
>
>    最后
>
>    对于taskInstance的startTime,大家觉得如何初始化比较好?是否需要按照我的观点进行改进?或者是否保持原有的设计?或者有其他的建议?
>
>    PS:
>
>    对于taskInstance的endTime也有同样的问题,worker在执行完任务之后,会sendResult更新这个时间,但master在waitTaskQuit之后也会去更新一次这个时间。

Re: [DISCUSS][SUMMER-2020] Time setting of the task instance

Posted by wu shaoj <ga...@apache.org>.
A variable should not be assigned before it meets its definition, which should be a development principle.
Take your point. The start time should be assigned just once. Status should be added to a task as an attribute which indicate current status, such as PREPARE/ACKED/RUNNING/STOPED


On 2020/7/29, 18:24, "wang" <ww...@163.com> wrote:

    Discussion: Time setting

    I wrote a function to delay the execution of the task, but in my discussion with my mentor, there was a disagreement on how to assign the start time in the task instance.

    Problem Description

    In the original design, the start time of the task instance will be initialized or changed in several places:

    In the createTaskInstance() method in dolphinscheduler-server/.../server/master/runner/MasterExecThread.java
    dolphinscheduler-server/.../server/master/runner/SubProcessTaskExecThread.javawill update the start time once the task is submitted
    If a failed task tries to retry, it will be updated in the submitTaskInstanecToDB() method in dolphinscheduler-service/.../service/process/ProcessService.java
    After the worker is dispatched, an ACK is sent in the process() method in dolphinscheduler-server/.../server/worker/processor/TaskExecuteProcessor.javato update the start time

    Most tasks (such as tasks that are dispatched to workers) have two start times from the creation of an instance to the completion of execution. Let us briefly analyze what problems will arise in this way. For example, there is a function to view the running time of a task that uses this start time, and then a task is created at 10:00, and a worker receives the task at 10:05. According to the original design, I can see that the task has been running for 4 minutes at 10:04, but I will see that the task has been running for 1 minute at 10:06 (because the worker will send an ACK to update the start time at 10:05), which is obviously unreasonable.

    Let me first define a few key time points (condition nodes and dependent nodes are considered separately):

    Time when the master created the task instance
    Worker receiving task instance time
    Worker delay execution start time
    Time when the worker starts to execute the task
    Worker execution task completion time

    Then define the time between each time point:

    1->2: Waiting time in the queue (including submission to the database, removal from the database, submission to the task queue, and removal from the task queue)
    3->4: Delay execution time
    4->5: Task execution time
    My mentor's point of view

    He defines startTime in TaskInstance as time point 4. Before this time point, this value should not be null, and thinks that it is better to initialize to new Date(). At the same time, he believes that the original design should be maintained, and the value can be updated multiple times if necessary.

    My point of view

    I also defined the startTime in TaskInstance as time point 4, but I think this value should remain nullbefore this time point, and this value is assigned only once in the life cycle of a task instance. Will no longer be updated afterwards.

    My reasons are as follows:

    A variable should not be assigned before it meets its definition;
    Multiple updates will lead to inconsistencies in reading data multiple times, such as the example of checking the running time of the task mentioned earlier;
    There is a member submitTime in TaskInstance. In the original design, its value is very close to startTime. When the system load is small, they are almost always the same. So where you need to use before startTime initialization, you can use submitTime instead;
    However, if the startTime before initialization needs to be used somewhere in the code, then whether this time should be called startTime is open to question;
    Use IDEA for code analysis, startTime is rarely used.

    In addition, I found that in the code for viewing the Gantt chart, there is a line of code that is written like this:

    Date startTime = taskInstance.getStartTime() == null? New Date(): taskInstance.getStartTime();

    It is considered here that startTime can be null, which supports my point from the side.

    The above questions can be summed up as follows: The original design of the system is ambiguous in the definition of the task start time.

    Impact of improvement

    The code of startTime that has not been initialized may be used. I found two places:

    checkTaskAfterWorkerStart() in dolphinscheduler-server/../server/zk/ZKMasterClient.java
    getRemaintime() in dolphinscheduler-server/.../server/master/runner/MasterTaskExecThread.java

    For the first part, since I am not familiar with this part of the code, I don't know whether adding a judgment can complete the original logic. But for the second part, since the worker may crash at any time between time 2 and 5, if the startTime is initialized without sending an ACK before the crash, the master will not be able to determine whether it has timed out and take corresponding actions. I do not have a good solution to this problem for the time being, and welcome your comments.

    At last

    For the startTime of taskInstance, how do you think it is better to initialize? Do I need to make improvements based on my point of view? Or whether to keep the original design? Or have other suggestions?

    PS:

    The same problem exists for the endTime of taskInstance. After the worker finishes executing the task, the sendResult will update this time, but the master will also update this time after waitTaskQuit.

    讨论:时间的设置

    我编写了一个使任务延迟执行的功能,但是在我与我的导师的讨论中,就任务实例(taskInstance)中的开始时间(startTime)的如何赋值发生了分歧。

    问题描述

    在原先的设计中,任务实例的开始时间会在多处得到初始化或者变更:

    dolphinscheduler-server/.../server/master/runner/MasterExecThread.java中的createTaskInstance()方法中
    dolphinscheduler-server/.../server/master/runner/SubProcessTaskExecThread.java会在提交任务后更新一次开始时间
    如果是失败任务尝试重试,会在dolphinscheduler-service/.../service/process/ProcessService.java中的submitTaskInstanecToDB()方法中得到更新
    worker被派发任务后,dolphinscheduler-server/.../server/worker/processor/TaskExecuteProcessor.java中process()方法中发送了一个ACK去更新开始时间

    大多数任务(例如被派发到worker的任务)从创建实例到执行完成,会有两个开始时间。让我们简单分析一下这样会出现什么问题。比如现在有一个查看任务已运行时间的功能使用到了这个开始时间,然后有一个任务在10:00创建,在10:05时有一个worker接收了这个任务。按照原先的设计,我在10:04时能够查看到任务已运行时间为4分钟,但是我将在10:06时查看到任务已运行时间变短为1分钟(因为10:05时worker会发送一个ACK去更新开始时间),这显然是不合理的。

    下面我先定义几个关键的时间点(条件节点和依赖节点单独考虑):

    master创建任务实例时间
    worker接收任务实例时间
    worker延迟执行开始时间
    worker开始执行任务时间
    worker执行任务完成时间

    然后就每个时间点之间的时间作出定义:

    1->2:在队列中的等待时间(包括提交到数据库、从数据库中取出、提交到任务队列、从任务队列中取出)
    3->4:延迟执行时间
    4->5:任务执行时间
    我的导师的观点

    他将TaskInstance中的startTime定义为时间点4,在该时间点之前这个值不应该为null,并且认为初始化为new Date()比较好。同时他认为应保持原先的设计,必要时可以对该值进行多次更新。

    我的观点

    我也将TaskInstance中的startTime定义为时间点4,但是我认为在该时间点之前这个值应该保持为null,并且在一个任务实例的生命周期中,这个值有且只有一次被赋值,赋值后不再被更新。

    我的理由有以下几点:

    一个变量尚未满足它的定义之前不应该被赋值;
    多次更新将导致多次读取数据不一致,例如前面提到的查看任务已运行时间的例子;
    TaskInstance中有一个成员submitTime,在原先的设计中它的值跟startTime非常接近,在系统负载较小的时候,它们俩几乎总是保持一致。所以在startTime初始化之前需要用到的地方可以使用submitTime代替;
    但是,如果代码某处需要用到初始化之前的startTime,那么这个时间是否应该叫做startTime就有待商榷了;
    使用IDEA进行代码分析,startTime很少被用到。

    此外,我发现在查看甘特图的代码中,有一行代码是这么写的:

    Date startTime = taskInstance.getStartTime() == null ? new Date() : taskInstance.getStartTime();

    在这里就考虑到了startTime是可以为null的,这从侧面佐证了我的观点。

    上面的问题总结起来就是:系统原先的设计在任务开始时间的定义上就是模棱两可的。

    改进带来的影响

    可能会用到尚未初始化的startTime的代码我发现两处:

    dolphinscheduler-server/../server/zk/ZKMasterClient.java中的checkTaskAfterWorkerStart()
    dolphinscheduler-server/.../server/master/runner/MasterTaskExecThread.java中的getRemaintime()

    对于第一处,由于我还不熟悉这部分代码,不知道增加一个判断能否完成原有的逻辑。但是对于第二处,由于worker可能会在时间点2到5之间的任意时刻奔溃,如果奔溃之前没有发送ACK将startTime初始化,master将无法判断是否超时并作出相应的动作。对于这个问题我暂时没有好的解决方案,欢迎大家提意见。

    最后

    对于taskInstance的startTime,大家觉得如何初始化比较好?是否需要按照我的观点进行改进?或者是否保持原有的设计?或者有其他的建议?

    PS:

    对于taskInstance的endTime也有同样的问题,worker在执行完任务之后,会sendResult更新这个时间,但master在waitTaskQuit之后也会去更新一次这个时间。