You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@dolphinscheduler.apache.org by Yichao Yang <10...@qq.com> on 2020/08/30 04:41:39 UTC

[DISCUSS][api] Workflow datasource and resource version control

Hi all,


The first version of version control has been implemented, but at present only supports workflow version control, and does not do version control for data sources and resource files. So this discussion focuses on the version control of data sources and resource files.


My views are as follows:


1. Version control of data source:


Current implementation mode: At present, we store ID directly in process for data source in `process_definition_json`.


Plan implementation mode: In the first way, the data source ID is directly replaced with the context information related to the data source and stored in `process_definition_json`, the advantage of this implementation is that it will be very simple and easy to understand. The second way is to create a new version control table of data source, and each version of the data source is added to the `process_definition_json`. The above two methods need to write update scripts to support the upgrade from low version to high version.


Existing problems: First, if the encryption and decryption token is changed, the data source password and other parsing will be wrong. In fact, this problem also exists at present. The second problem is that if the data source information changes, the workflow definition can not be perceived. No matter which implementation method is mentioned above, this problem exists.


2. Resource version control:


Current implementation mode: At present, we store ID directly in `process_definition_json`.


Plan implementation mode: In the first way, the FS storage path of resources is directly stored in `process_definition_json`, the advantage of this implementation is that it will be very simple. The second way is to create a new resource version control table, and the version of each resource is added to the `process_definition_json`. The above two methods need to write update scripts to support the upgrade from low version to high version.


Existing problems: if the resource information changes, such as manual deletion of resources, the workflow definition cannot be perceived. No matter which implementation method is mentioned above, this problem exists.



-------------------------


第一个版本的版本控制已经实现,但是目前只支持工作流的版本控制,对于其中的数据源和资源文件等没有做版本控制。所以这次讨论主要针对数据源和资源文件的版本控制。


我的观点如下:
1.数据源的版本控制:
当前实现方式:目前数据源我们是直接将id存储在process_definition_json中。


计划实现方式:第一种方式,直接将数据源id替换为数据源相关的上下文信息存储在process_definition_json中,这种实现方式的优点是会很简单。第二种方式,新建数据源版本控制表,每次的数据源的版本都添加进process_definition_json中。上述两种方式都需要写更新脚本以支持低版本到高版本的升级。


存在的问题:第一个问题,如果加解密token发生变更,数据源密码等解析就会出错,其实目前也存在这个问题。第二个问题,如果数据源信息发生变化,工作流定义这边无法感知,无论上述哪种实现方式,都存在这种问题。


2.资源版本控制:
当前实现方式:目前资源我们是直接将id存储在process_definition_json中。
计划实现方式:第一种方式,直接将资源的fs的存储路径存储在process_definition_json,这种实现方式的优点是会很简单。第二种方式,新建资源版本控制表,每次的资源的版本都添加进process_definition_json中。上述两种方式都需要写更新脚本以支持低版本到高版本的升级。


存在的问题:如果资源信息发生变化,比如资源人工删除等,工作流定义这边无法感知,无论上述哪种实现方式,都存在这种问题。


If you have a better way or idea, welcome reply and leave a message~



Best,
Yichao Yang