You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@dolphinscheduler.apache.org by Hemin Wen <we...@apache.org> on 2021/07/01 09:44:33 UTC

[DISCUSS] Create workflow, front end and back end interaction optimization(Json split project)

Hi!

Json split design reference:
https://lists.apache.org/thread.html/r84ed046de63899476f067dd207496428812a8687045bb4485072ee2d%40%3Cdev.dolphinscheduler.apache.org%3E

For the current interaction design problems, redesign, if you have
different opinions, you can participate in the discussion.
If support, please +1.

Design objective:

    - Remove the processdefinitionjson field in the API when creating /
editing a workflow

    - When viewing the workflow, the back end does not need to return the
task detail data, just return the task relation data

    - When editing a workflow, it supports referencing tasks that already
exist in the database

    - When creating / editing a workflow, the front end only needs to send
the necessary task information. For the referenced task, it does not need
to send the task detail data, which reduces the interaction data and
improves the interaction efficiency

Current design:

    - New / edit / copy task nodes need to call back-end task interface to
maintain a task data

    - After building DAG, users click Save workflow. At this time, the
front end does not need to send task details data, just need to send task
code

The problems in the current design

        When editing DAG, the front end supports right-click copy of a node
to generate a new node. According to the current design, you need to call
the back-end interface to create a task. That is to say, the newly added
nodes in the workflow creation process will generate a task data in the
database (the workflow details page is not allowed to delete the database
task, because there may be potential dependencies). It is inevitable that
some of the nodes are temporary nodes or newly added and deleted nodes. In
the long run, task garbage data will become more and more difficult to
maintain.

Design optimization:

    - The back end opens the function of generating unique encoding as an
interface

    - When editing DAG and adding / copying nodes, the front end calls the
unique encoding interface to generate task encoding, and the version is
empty

    - When you edit DAG and reference task, you can directly get the code
and version of task

    - When the workflow is saved, only the code and version of the
referenced task will be sent, and all the task information will be sent to
the new task

    - For node relation data, coordinate data and other fields, the
front-end does not need to generate task unique ID according to the "task
random number" rule, and all of them are replaced by task code

    - When querying the workflow details, the back end does not need to
return the task details. When double clicking the task node, the back end
interface is called to query the task details

----------------------------------------------------------------------------------------------------------------------

设计目标:

- 去掉创建/编辑工作流时,API中的processDefinitionJson字段
- 查看工作流时,后端不需要返回task明细数据,返回task关系数据即可
- 编辑工作流时,支持引用数据库中已存在的task
- 创建/编辑工作流时,前端只需上送必要的task信息,对于引用的task,不需要上送task明细数据,减小交互数据,提高交互效率

当前设计:

- 新建/编辑/复制 task节点都需要调用后端task接口,维护一条task数据
- 用户在构建完Dag后,点击保存工作流,此时前端不需要上送task明细数据,只需要上送task编码即可

当前设计存在的问题:

在编辑Dag时,前端支持选择一个节点右键复制,生成一个新的节点。按照当前设计,此时也要调用后端接口创建一个task。也就是说,创建工作流过程中新增的节点都会在数据库产生一条task数据(工作流详情页是不允许删除数据库task的,因为可能存在潜在的依赖),其中难免一些节点是临时节点或新增又删除的节点。长期会导致task垃圾数据越来越多,越来越难以维护。

优化设计:

- 后端将生成唯一编码功能开放为接口
- 编辑Dag,新增/复制节点时,前端调用唯一编码接口生成task编码,版本为空
- 编辑Dag,引用task时,直接能获取到task的编码、版本
- 保存工作流时,引用的task只上送task编码、版本,新建task上送所有task信息
- 关于节点关系数据、坐标数据等字段,前端不需要再按照"task-随机数"规则生成task唯一ID,全部替换为task编码
- 查询工作流详情时,后端不需要返回task明细数据,双击task节点时,调用后端接口查询task详情数据

--------------------
Apache DolphinScheduler Commtter
Hemin Wen  温合民
wenhemin@apache.org
--------------------

Re: [DISCUSS] Create workflow, front end and back end interaction optimization(Json split project)

Posted by "jiny.li@foxmail.com" <ji...@foxmail.com>.
+1



jiny.li@foxmail.com
 
From: Hemin Wen
Date: 2021-07-01 17:44
To: dev
Subject: [DISCUSS] Create workflow, front end and back end interaction optimization(Json split project)
Hi!
 
Json split design reference:
https://lists.apache.org/thread.html/r84ed046de63899476f067dd207496428812a8687045bb4485072ee2d%40%3Cdev.dolphinscheduler.apache.org%3E
 
For the current interaction design problems, redesign, if you have
different opinions, you can participate in the discussion.
If support, please +1.
 
Design objective:
 
    - Remove the processdefinitionjson field in the API when creating /
editing a workflow
 
    - When viewing the workflow, the back end does not need to return the
task detail data, just return the task relation data
 
    - When editing a workflow, it supports referencing tasks that already
exist in the database
 
    - When creating / editing a workflow, the front end only needs to send
the necessary task information. For the referenced task, it does not need
to send the task detail data, which reduces the interaction data and
improves the interaction efficiency
 
Current design:
 
    - New / edit / copy task nodes need to call back-end task interface to
maintain a task data
 
    - After building DAG, users click Save workflow. At this time, the
front end does not need to send task details data, just need to send task
code
 
The problems in the current design
 
        When editing DAG, the front end supports right-click copy of a node
to generate a new node. According to the current design, you need to call
the back-end interface to create a task. That is to say, the newly added
nodes in the workflow creation process will generate a task data in the
database (the workflow details page is not allowed to delete the database
task, because there may be potential dependencies). It is inevitable that
some of the nodes are temporary nodes or newly added and deleted nodes. In
the long run, task garbage data will become more and more difficult to
maintain.
 
Design optimization:
 
    - The back end opens the function of generating unique encoding as an
interface
 
    - When editing DAG and adding / copying nodes, the front end calls the
unique encoding interface to generate task encoding, and the version is
empty
 
    - When you edit DAG and reference task, you can directly get the code
and version of task
 
    - When the workflow is saved, only the code and version of the
referenced task will be sent, and all the task information will be sent to
the new task
 
    - For node relation data, coordinate data and other fields, the
front-end does not need to generate task unique ID according to the "task
random number" rule, and all of them are replaced by task code
 
    - When querying the workflow details, the back end does not need to
return the task details. When double clicking the task node, the back end
interface is called to query the task details
 
----------------------------------------------------------------------------------------------------------------------
 
设计目标:
 
- 去掉创建/编辑工作流时,API中的processDefinitionJson字段
- 查看工作流时,后端不需要返回task明细数据,返回task关系数据即可
- 编辑工作流时,支持引用数据库中已存在的task
- 创建/编辑工作流时,前端只需上送必要的task信息,对于引用的task,不需要上送task明细数据,减小交互数据,提高交互效率
 
当前设计:
 
- 新建/编辑/复制 task节点都需要调用后端task接口,维护一条task数据
- 用户在构建完Dag后,点击保存工作流,此时前端不需要上送task明细数据,只需要上送task编码即可
 
当前设计存在的问题:
 
在编辑Dag时,前端支持选择一个节点右键复制,生成一个新的节点。按照当前设计,此时也要调用后端接口创建一个task。也就是说,创建工作流过程中新增的节点都会在数据库产生一条task数据(工作流详情页是不允许删除数据库task的,因为可能存在潜在的依赖),其中难免一些节点是临时节点或新增又删除的节点。长期会导致task垃圾数据越来越多,越来越难以维护。
 
优化设计:
 
- 后端将生成唯一编码功能开放为接口
- 编辑Dag,新增/复制节点时,前端调用唯一编码接口生成task编码,版本为空
- 编辑Dag,引用task时,直接能获取到task的编码、版本
- 保存工作流时,引用的task只上送task编码、版本,新建task上送所有task信息
- 关于节点关系数据、坐标数据等字段,前端不需要再按照"task-随机数"规则生成task唯一ID,全部替换为task编码
- 查询工作流详情时,后端不需要返回task明细数据,双击task节点时,调用后端接口查询task详情数据
 
--------------------
Apache DolphinScheduler Commtter
Hemin Wen  温合民
wenhemin@apache.org
--------------------

Re: [DISCUSS] Create workflow, front end and back end interaction optimization(Json split project)

Posted by Lidong Dai <da...@gmail.com>.
+1, the priority of modifying the workflow at the same time is lower


Best Regards



---------------
Apache DolphinScheduler PMC Chair
David
lidongdai@apache.org
Linkedin: https://www.linkedin.com/in/dailidong
Twitter: @WorkflowEasy <https://twitter.com/WorkflowEasy>
---------------


On Fri, Jul 2, 2021 at 10:41 AM Hemin Wen <we...@apache.org> wrote:

> Hi, Chen!
>
> There are many people who modify the workflow at the same time, and this
> problem also exists at present.
> The probability of this operation is very low, at least nobody has raised
> this problem at present.
>
> I think it can be solved by adding row level locks to the database.
>
>
> --------------------
> Apache DolphinScheduler Commtter
> Hemin Wen  温合民
> wenhemin@apache.org
> --------------------
>
>
> Jave-Chen <ke...@foxmail.com> 于2021年7月1日周四 下午9:36写道:
>
> > Hi, Wen
> > In a concurrent scenario, if two or more people edit a&nbsp;
> > workflow dag at the same time, will there be a problem&nbsp;
> > of locking the database table?
> >
> >
> >
> >
> > ------------------&nbsp;原始邮件&nbsp;------------------
> > 发件人:
> >                                                   "dev"
> >                                                                 <
> > wenhemin@apache.org&gt;;
> > 发送时间:&nbsp;2021年7月1日(星期四) 下午5:44
> > 收件人:&nbsp;"dev"<dev@dolphinscheduler.apache.org&gt;;
> >
> > 主题:&nbsp;[DISCUSS] Create workflow, front end and back end interaction
> > optimization(Json split project)
> >
> >
> >
> > Hi!
> >
> > Json split design reference:
> >
> >
> https://lists.apache.org/thread.html/r84ed046de63899476f067dd207496428812a8687045bb4485072ee2d%40%3Cdev.dolphinscheduler.apache.org%3E
> >
> > For the current interaction design problems, redesign, if you have
> > different opinions, you can participate in the discussion.
> > If support, please +1.
> >
> > Design objective:
> >
> > &nbsp;&nbsp;&nbsp; - Remove the processdefinitionjson field in the API
> > when creating /
> > editing a workflow
> >
> > &nbsp;&nbsp;&nbsp; - When viewing the workflow, the back end does not
> need
> > to return the
> > task detail data, just return the task relation data
> >
> > &nbsp;&nbsp;&nbsp; - When editing a workflow, it supports referencing
> > tasks that already
> > exist in the database
> >
> > &nbsp;&nbsp;&nbsp; - When creating / editing a workflow, the front end
> > only needs to send
> > the necessary task information. For the referenced task, it does not need
> > to send the task detail data, which reduces the interaction data and
> > improves the interaction efficiency
> >
> > Current design:
> >
> > &nbsp;&nbsp;&nbsp; - New / edit / copy task nodes need to call back-end
> > task interface to
> > maintain a task data
> >
> > &nbsp;&nbsp;&nbsp; - After building DAG, users click Save workflow. At
> > this time, the
> > front end does not need to send task details data, just need to send task
> > code
> >
> > The problems in the current design
> >
> > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; When editing DAG, the front
> end
> > supports right-click copy of a node
> > to generate a new node. According to the current design, you need to call
> > the back-end interface to create a task. That is to say, the newly added
> > nodes in the workflow creation process will generate a task data in the
> > database (the workflow details page is not allowed to delete the database
> > task, because there may be potential dependencies). It is inevitable that
> > some of the nodes are temporary nodes or newly added and deleted nodes.
> In
> > the long run, task garbage data will become more and more difficult to
> > maintain.
> >
> > Design optimization:
> >
> > &nbsp;&nbsp;&nbsp; - The back end opens the function of generating unique
> > encoding as an
> > interface
> >
> > &nbsp;&nbsp;&nbsp; - When editing DAG and adding / copying nodes, the
> > front end calls the
> > unique encoding interface to generate task encoding, and the version is
> > empty
> >
> > &nbsp;&nbsp;&nbsp; - When you edit DAG and reference task, you can
> > directly get the code
> > and version of task
> >
> > &nbsp;&nbsp;&nbsp; - When the workflow is saved, only the code and
> version
> > of the
> > referenced task will be sent, and all the task information will be sent
> to
> > the new task
> >
> > &nbsp;&nbsp;&nbsp; - For node relation data, coordinate data and other
> > fields, the
> > front-end does not need to generate task unique ID according to the "task
> > random number" rule, and all of them are replaced by task code
> >
> > &nbsp;&nbsp;&nbsp; - When querying the workflow details, the back end
> does
> > not need to
> > return the task details. When double clicking the task node, the back end
> > interface is called to query the task details
> >
> >
> >
> ----------------------------------------------------------------------------------------------------------------------
> >
> > 设计目标:
> >
> > - 去掉创建/编辑工作流时,API中的processDefinitionJson字段
> > - 查看工作流时,后端不需要返回task明细数据,返回task关系数据即可
> > - 编辑工作流时,支持引用数据库中已存在的task
> > - 创建/编辑工作流时,前端只需上送必要的task信息,对于引用的task,不需要上送task明细数据,减小交互数据,提高交互效率
> >
> > 当前设计:
> >
> > - 新建/编辑/复制 task节点都需要调用后端task接口,维护一条task数据
> > - 用户在构建完Dag后,点击保存工作流,此时前端不需要上送task明细数据,只需要上送task编码即可
> >
> > 当前设计存在的问题:
> >
> >
> >
> 在编辑Dag时,前端支持选择一个节点右键复制,生成一个新的节点。按照当前设计,此时也要调用后端接口创建一个task。也就是说,创建工作流过程中新增的节点都会在数据库产生一条task数据(工作流详情页是不允许删除数据库task的,因为可能存在潜在的依赖),其中难免一些节点是临时节点或新增又删除的节点。长期会导致task垃圾数据越来越多,越来越难以维护。
> >
> > 优化设计:
> >
> > - 后端将生成唯一编码功能开放为接口
> > - 编辑Dag,新增/复制节点时,前端调用唯一编码接口生成task编码,版本为空
> > - 编辑Dag,引用task时,直接能获取到task的编码、版本
> > - 保存工作流时,引用的task只上送task编码、版本,新建task上送所有task信息
> > - 关于节点关系数据、坐标数据等字段,前端不需要再按照"task-随机数"规则生成task唯一ID,全部替换为task编码
> > - 查询工作流详情时,后端不需要返回task明细数据,双击task节点时,调用后端接口查询task详情数据
> >
> > --------------------
> > Apache DolphinScheduler Commtter
> > Hemin Wen&nbsp; 温合民
> > wenhemin@apache.org
> > --------------------
>

Re: [DISCUSS] Create workflow, front end and back end interaction optimization(Json split project)

Posted by Hemin Wen <we...@apache.org>.
Hi, Chen!

There are many people who modify the workflow at the same time, and this
problem also exists at present.
The probability of this operation is very low, at least nobody has raised
this problem at present.

I think it can be solved by adding row level locks to the database.


--------------------
Apache DolphinScheduler Commtter
Hemin Wen  温合民
wenhemin@apache.org
--------------------


Jave-Chen <ke...@foxmail.com> 于2021年7月1日周四 下午9:36写道:

> Hi, Wen
> In a concurrent scenario, if two or more people edit a&nbsp;
> workflow dag at the same time, will there be a problem&nbsp;
> of locking the database table?
>
>
>
>
> ------------------&nbsp;原始邮件&nbsp;------------------
> 发件人:
>                                                   "dev"
>                                                                 <
> wenhemin@apache.org&gt;;
> 发送时间:&nbsp;2021年7月1日(星期四) 下午5:44
> 收件人:&nbsp;"dev"<dev@dolphinscheduler.apache.org&gt;;
>
> 主题:&nbsp;[DISCUSS] Create workflow, front end and back end interaction
> optimization(Json split project)
>
>
>
> Hi!
>
> Json split design reference:
>
> https://lists.apache.org/thread.html/r84ed046de63899476f067dd207496428812a8687045bb4485072ee2d%40%3Cdev.dolphinscheduler.apache.org%3E
>
> For the current interaction design problems, redesign, if you have
> different opinions, you can participate in the discussion.
> If support, please +1.
>
> Design objective:
>
> &nbsp;&nbsp;&nbsp; - Remove the processdefinitionjson field in the API
> when creating /
> editing a workflow
>
> &nbsp;&nbsp;&nbsp; - When viewing the workflow, the back end does not need
> to return the
> task detail data, just return the task relation data
>
> &nbsp;&nbsp;&nbsp; - When editing a workflow, it supports referencing
> tasks that already
> exist in the database
>
> &nbsp;&nbsp;&nbsp; - When creating / editing a workflow, the front end
> only needs to send
> the necessary task information. For the referenced task, it does not need
> to send the task detail data, which reduces the interaction data and
> improves the interaction efficiency
>
> Current design:
>
> &nbsp;&nbsp;&nbsp; - New / edit / copy task nodes need to call back-end
> task interface to
> maintain a task data
>
> &nbsp;&nbsp;&nbsp; - After building DAG, users click Save workflow. At
> this time, the
> front end does not need to send task details data, just need to send task
> code
>
> The problems in the current design
>
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; When editing DAG, the front end
> supports right-click copy of a node
> to generate a new node. According to the current design, you need to call
> the back-end interface to create a task. That is to say, the newly added
> nodes in the workflow creation process will generate a task data in the
> database (the workflow details page is not allowed to delete the database
> task, because there may be potential dependencies). It is inevitable that
> some of the nodes are temporary nodes or newly added and deleted nodes. In
> the long run, task garbage data will become more and more difficult to
> maintain.
>
> Design optimization:
>
> &nbsp;&nbsp;&nbsp; - The back end opens the function of generating unique
> encoding as an
> interface
>
> &nbsp;&nbsp;&nbsp; - When editing DAG and adding / copying nodes, the
> front end calls the
> unique encoding interface to generate task encoding, and the version is
> empty
>
> &nbsp;&nbsp;&nbsp; - When you edit DAG and reference task, you can
> directly get the code
> and version of task
>
> &nbsp;&nbsp;&nbsp; - When the workflow is saved, only the code and version
> of the
> referenced task will be sent, and all the task information will be sent to
> the new task
>
> &nbsp;&nbsp;&nbsp; - For node relation data, coordinate data and other
> fields, the
> front-end does not need to generate task unique ID according to the "task
> random number" rule, and all of them are replaced by task code
>
> &nbsp;&nbsp;&nbsp; - When querying the workflow details, the back end does
> not need to
> return the task details. When double clicking the task node, the back end
> interface is called to query the task details
>
>
> ----------------------------------------------------------------------------------------------------------------------
>
> 设计目标:
>
> - 去掉创建/编辑工作流时,API中的processDefinitionJson字段
> - 查看工作流时,后端不需要返回task明细数据,返回task关系数据即可
> - 编辑工作流时,支持引用数据库中已存在的task
> - 创建/编辑工作流时,前端只需上送必要的task信息,对于引用的task,不需要上送task明细数据,减小交互数据,提高交互效率
>
> 当前设计:
>
> - 新建/编辑/复制 task节点都需要调用后端task接口,维护一条task数据
> - 用户在构建完Dag后,点击保存工作流,此时前端不需要上送task明细数据,只需要上送task编码即可
>
> 当前设计存在的问题:
>
>
> 在编辑Dag时,前端支持选择一个节点右键复制,生成一个新的节点。按照当前设计,此时也要调用后端接口创建一个task。也就是说,创建工作流过程中新增的节点都会在数据库产生一条task数据(工作流详情页是不允许删除数据库task的,因为可能存在潜在的依赖),其中难免一些节点是临时节点或新增又删除的节点。长期会导致task垃圾数据越来越多,越来越难以维护。
>
> 优化设计:
>
> - 后端将生成唯一编码功能开放为接口
> - 编辑Dag,新增/复制节点时,前端调用唯一编码接口生成task编码,版本为空
> - 编辑Dag,引用task时,直接能获取到task的编码、版本
> - 保存工作流时,引用的task只上送task编码、版本,新建task上送所有task信息
> - 关于节点关系数据、坐标数据等字段,前端不需要再按照"task-随机数"规则生成task唯一ID,全部替换为task编码
> - 查询工作流详情时,后端不需要返回task明细数据,双击task节点时,调用后端接口查询task详情数据
>
> --------------------
> Apache DolphinScheduler Commtter
> Hemin Wen&nbsp; 温合民
> wenhemin@apache.org
> --------------------

回复:[DISCUSS] Create workflow, front end and back end interaction optimization(Json split project)

Posted by Jave-Chen <ke...@foxmail.com>.
Hi, Wen
In a concurrent scenario, if two or more people edit a&nbsp;
workflow dag at the same time, will there be a problem&nbsp;
of locking the database table?




------------------&nbsp;原始邮件&nbsp;------------------
发件人:                                                                                                                        "dev"                                                                                    <wenhemin@apache.org&gt;;
发送时间:&nbsp;2021年7月1日(星期四) 下午5:44
收件人:&nbsp;"dev"<dev@dolphinscheduler.apache.org&gt;;

主题:&nbsp;[DISCUSS] Create workflow, front end and back end interaction optimization(Json split project)



Hi!

Json split design reference:
https://lists.apache.org/thread.html/r84ed046de63899476f067dd207496428812a8687045bb4485072ee2d%40%3Cdev.dolphinscheduler.apache.org%3E

For the current interaction design problems, redesign, if you have
different opinions, you can participate in the discussion.
If support, please +1.

Design objective:

&nbsp;&nbsp;&nbsp; - Remove the processdefinitionjson field in the API when creating /
editing a workflow

&nbsp;&nbsp;&nbsp; - When viewing the workflow, the back end does not need to return the
task detail data, just return the task relation data

&nbsp;&nbsp;&nbsp; - When editing a workflow, it supports referencing tasks that already
exist in the database

&nbsp;&nbsp;&nbsp; - When creating / editing a workflow, the front end only needs to send
the necessary task information. For the referenced task, it does not need
to send the task detail data, which reduces the interaction data and
improves the interaction efficiency

Current design:

&nbsp;&nbsp;&nbsp; - New / edit / copy task nodes need to call back-end task interface to
maintain a task data

&nbsp;&nbsp;&nbsp; - After building DAG, users click Save workflow. At this time, the
front end does not need to send task details data, just need to send task
code

The problems in the current design

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; When editing DAG, the front end supports right-click copy of a node
to generate a new node. According to the current design, you need to call
the back-end interface to create a task. That is to say, the newly added
nodes in the workflow creation process will generate a task data in the
database (the workflow details page is not allowed to delete the database
task, because there may be potential dependencies). It is inevitable that
some of the nodes are temporary nodes or newly added and deleted nodes. In
the long run, task garbage data will become more and more difficult to
maintain.

Design optimization:

&nbsp;&nbsp;&nbsp; - The back end opens the function of generating unique encoding as an
interface

&nbsp;&nbsp;&nbsp; - When editing DAG and adding / copying nodes, the front end calls the
unique encoding interface to generate task encoding, and the version is
empty

&nbsp;&nbsp;&nbsp; - When you edit DAG and reference task, you can directly get the code
and version of task

&nbsp;&nbsp;&nbsp; - When the workflow is saved, only the code and version of the
referenced task will be sent, and all the task information will be sent to
the new task

&nbsp;&nbsp;&nbsp; - For node relation data, coordinate data and other fields, the
front-end does not need to generate task unique ID according to the "task
random number" rule, and all of them are replaced by task code

&nbsp;&nbsp;&nbsp; - When querying the workflow details, the back end does not need to
return the task details. When double clicking the task node, the back end
interface is called to query the task details

----------------------------------------------------------------------------------------------------------------------

设计目标:

- 去掉创建/编辑工作流时,API中的processDefinitionJson字段
- 查看工作流时,后端不需要返回task明细数据,返回task关系数据即可
- 编辑工作流时,支持引用数据库中已存在的task
- 创建/编辑工作流时,前端只需上送必要的task信息,对于引用的task,不需要上送task明细数据,减小交互数据,提高交互效率

当前设计:

- 新建/编辑/复制 task节点都需要调用后端task接口,维护一条task数据
- 用户在构建完Dag后,点击保存工作流,此时前端不需要上送task明细数据,只需要上送task编码即可

当前设计存在的问题:

在编辑Dag时,前端支持选择一个节点右键复制,生成一个新的节点。按照当前设计,此时也要调用后端接口创建一个task。也就是说,创建工作流过程中新增的节点都会在数据库产生一条task数据(工作流详情页是不允许删除数据库task的,因为可能存在潜在的依赖),其中难免一些节点是临时节点或新增又删除的节点。长期会导致task垃圾数据越来越多,越来越难以维护。

优化设计:

- 后端将生成唯一编码功能开放为接口
- 编辑Dag,新增/复制节点时,前端调用唯一编码接口生成task编码,版本为空
- 编辑Dag,引用task时,直接能获取到task的编码、版本
- 保存工作流时,引用的task只上送task编码、版本,新建task上送所有task信息
- 关于节点关系数据、坐标数据等字段,前端不需要再按照"task-随机数"规则生成task唯一ID,全部替换为task编码
- 查询工作流详情时,后端不需要返回task明细数据,双击task节点时,调用后端接口查询task详情数据

--------------------
Apache DolphinScheduler Commtter
Hemin Wen&nbsp; 温合民
wenhemin@apache.org
--------------------

Re:[DISCUSS] Create workflow, front end and back end interaction optimization(Json split project)

Posted by Xingfei <fl...@163.com>.
+1

















在 2021-07-01 17:44:33,"Hemin Wen" <we...@apache.org> 写道:
>Hi!
>
>Json split design reference:
>https://lists.apache.org/thread.html/r84ed046de63899476f067dd207496428812a8687045bb4485072ee2d%40%3Cdev.dolphinscheduler.apache.org%3E
>
>For the current interaction design problems, redesign, if you have
>different opinions, you can participate in the discussion.
>If support, please +1.
>
>Design objective:
>
>    - Remove the processdefinitionjson field in the API when creating /
>editing a workflow
>
>    - When viewing the workflow, the back end does not need to return the
>task detail data, just return the task relation data
>
>    - When editing a workflow, it supports referencing tasks that already
>exist in the database
>
>    - When creating / editing a workflow, the front end only needs to send
>the necessary task information. For the referenced task, it does not need
>to send the task detail data, which reduces the interaction data and
>improves the interaction efficiency
>
>Current design:
>
>    - New / edit / copy task nodes need to call back-end task interface to
>maintain a task data
>
>    - After building DAG, users click Save workflow. At this time, the
>front end does not need to send task details data, just need to send task
>code
>
>The problems in the current design
>
>        When editing DAG, the front end supports right-click copy of a node
>to generate a new node. According to the current design, you need to call
>the back-end interface to create a task. That is to say, the newly added
>nodes in the workflow creation process will generate a task data in the
>database (the workflow details page is not allowed to delete the database
>task, because there may be potential dependencies). It is inevitable that
>some of the nodes are temporary nodes or newly added and deleted nodes. In
>the long run, task garbage data will become more and more difficult to
>maintain.
>
>Design optimization:
>
>    - The back end opens the function of generating unique encoding as an
>interface
>
>    - When editing DAG and adding / copying nodes, the front end calls the
>unique encoding interface to generate task encoding, and the version is
>empty
>
>    - When you edit DAG and reference task, you can directly get the code
>and version of task
>
>    - When the workflow is saved, only the code and version of the
>referenced task will be sent, and all the task information will be sent to
>the new task
>
>    - For node relation data, coordinate data and other fields, the
>front-end does not need to generate task unique ID according to the "task
>random number" rule, and all of them are replaced by task code
>
>    - When querying the workflow details, the back end does not need to
>return the task details. When double clicking the task node, the back end
>interface is called to query the task details
>
>----------------------------------------------------------------------------------------------------------------------
>
>设计目标:
>
>- 去掉创建/编辑工作流时,API中的processDefinitionJson字段
>- 查看工作流时,后端不需要返回task明细数据,返回task关系数据即可
>- 编辑工作流时,支持引用数据库中已存在的task
>- 创建/编辑工作流时,前端只需上送必要的task信息,对于引用的task,不需要上送task明细数据,减小交互数据,提高交互效率
>
>当前设计:
>
>- 新建/编辑/复制 task节点都需要调用后端task接口,维护一条task数据
>- 用户在构建完Dag后,点击保存工作流,此时前端不需要上送task明细数据,只需要上送task编码即可
>
>当前设计存在的问题:
>
>在编辑Dag时,前端支持选择一个节点右键复制,生成一个新的节点。按照当前设计,此时也要调用后端接口创建一个task。也就是说,创建工作流过程中新增的节点都会在数据库产生一条task数据(工作流详情页是不允许删除数据库task的,因为可能存在潜在的依赖),其中难免一些节点是临时节点或新增又删除的节点。长期会导致task垃圾数据越来越多,越来越难以维护。
>
>优化设计:
>
>- 后端将生成唯一编码功能开放为接口
>- 编辑Dag,新增/复制节点时,前端调用唯一编码接口生成task编码,版本为空
>- 编辑Dag,引用task时,直接能获取到task的编码、版本
>- 保存工作流时,引用的task只上送task编码、版本,新建task上送所有task信息
>- 关于节点关系数据、坐标数据等字段,前端不需要再按照"task-随机数"规则生成task唯一ID,全部替换为task编码
>- 查询工作流详情时,后端不需要返回task明细数据,双击task节点时,调用后端接口查询task详情数据
>
>--------------------
>Apache DolphinScheduler Commtter
>Hemin Wen  温合民
>wenhemin@apache.org
>--------------------