You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@dolphinscheduler.apache.org by GitBox <gi...@apache.org> on 2022/06/24 10:57:04 UTC

[GitHub] [dolphinscheduler] caishunfeng opened a new pull request, #10600: [doc] Update metadata and design doc

caishunfeng opened a new pull request, #10600:
URL: https://github.com/apache/dolphinscheduler/pull/10600

   ## Purpose of the pull request
   
   Some content of metadata doc and design doc is out of date.
   
   ## Brief change log
   
   the doc of metadata and design


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] caishunfeng commented on a diff in pull request #10600: [doc] Update metadata and design doc

Posted by GitBox <gi...@apache.org>.
caishunfeng commented on code in PR #10600:
URL: https://github.com/apache/dolphinscheduler/pull/10600#discussion_r907012201


##########
docs/docs/en/architecture/design.md:
##########
@@ -46,27 +52,26 @@
      Server provides monitoring services based on netty.
   
      #### The Service Mainly Includes:
-  
-     - **Fetch TaskThread** is mainly responsible for continuously getting tasks from the **Task Queue**, and calling **TaskScheduleThread** corresponding executor according to different task types.
+
+    - **WorkerManagerThread** is mainly responsible for the submission of the task queue, continuously receives tasks from the task queue, and submits them to the thread pool for processing;
+
+    - **TaskExecuteThread** is mainly responsible for the process of task execution, and the actual processing of tasks according to different task types;
+
+    - **RetryReportTaskStatusThread** is mainly responsible for regularly polling to report the status of the task to the Master until the Master replies to the status ack to avoid the loss of the task status;

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] caishunfeng commented on a diff in pull request #10600: [doc] Update metadata and design doc

Posted by GitBox <gi...@apache.org>.
caishunfeng commented on code in PR #10600:
URL: https://github.com/apache/dolphinscheduler/pull/10600#discussion_r907012014


##########
docs/docs/en/architecture/design.md:
##########
@@ -29,14 +29,20 @@
     MasterServer provides monitoring services based on netty.
 
     #### The Service Mainly Includes:
+  
+    - **Distributed Quartz** distributed scheduling component, which is mainly responsible for the start and stop operations of scheduled tasks. When quartz start the task, there will be a thread pool inside the Master responsible for the follow-up operation of the processing task;

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] caishunfeng commented on a diff in pull request #10600: [doc] Update metadata and design doc

Posted by GitBox <gi...@apache.org>.
caishunfeng commented on code in PR #10600:
URL: https://github.com/apache/dolphinscheduler/pull/10600#discussion_r907012437


##########
docs/docs/en/architecture/metadata.md:
##########
@@ -39,155 +13,28 @@
 ![image.png](../../../img/metadata-erd/user-queue-datasource.png)
 
 - One tenant can own Multiple users.
-- The queue field in the t_ds_user table stores the queue_name information in the t_ds_queue table, t_ds_tenant stores queue information using queue_id column. During the execution of the process definition, the user queue has the highest priority. If the user queue is null, use the tenant queue.
-- The user_id field in the t_ds_datasource table shows the user who create the data source. The user_id in t_ds_relation_datasource_user shows the user who has permission to the data source.
+- The queue field in the `t_ds_user` table stores the queue_name information in the `t_ds_queue` table, `t_ds_tenant` stores queue information using queue_id column. During the execution of the process definition, the user queue has the highest priority. If the user queue is null, use the tenant queue.
+- The user_id field in the `t_ds_datasource` table shows the user who create the data source. The user_id in `t_ds_relation_datasource_user` shows the user who has permission to the data source.
   
 ### Project Resource Alert
 
 ![image.png](../../../img/metadata-erd/project-resource-alert.png)
 
-- User can have multiple projects, user project authorization completes the relationship binding using project_id and user_id in t_ds_relation_project_user table.
-- The user_id in the t_ds_projcet table represents the user who create the project, and the user_id in the t_ds_relation_project_user table represents users who have permission to the project.
-- The user_id in the t_ds_resources table represents the user who create the resource, and the user_id in t_ds_relation_resources_user represents the user who has permissions to the resource.
-- The user_id in the t_ds_udfs table represents the user who create the UDF, and the user_id in the t_ds_relation_udfs_user table represents a user who has permission to the UDF.
-  
-### Command Process Task
-
-![image.png](../../../img/metadata-erd/command.png)<br />![image.png](../../../img/metadata-erd/process-task.png)
-
-- A project has multiple process definitions, a process definition can generate multiple process instances, and a process instance can generate multiple task instances.
-- The t_ds_schedulers table stores the specified time schedule information for process definition.
-- The data stored in the t_ds_relation_process_instance table is used to deal with the sub-processes of a process definition, parent_process_instance_id field represents the id of the main process instance who contains child processes, process_instance_id field represents the id of the sub-process instance, parent_task_instance_id field represents the task instance id of the sub-process node.
-- The process instance table and the task instance table correspond to the t_ds_process_instance table and the t_ds_task_instance table, respectively.
-
----
-
-## Core Table Schema
-
-### t_ds_process_definition
-
-| Field | Type | Comment |
-| --- | --- | --- |
-| id | int | primary key |
-| name | varchar | process definition name |
-| version | int | process definition version |
-| release_state | tinyint | process definition release state:0:offline,1:online |
-| project_id | int | project id |
-| user_id | int | process definition creator id |
-| process_definition_json | longtext | process definition JSON content |
-| description | text | process definition description |
-| global_params | text | global parameters |
-| flag | tinyint | whether process available: 0 not available, 1 available |
-| locations | text | Node location information |
-| connects | text | Node connection information |
-| receivers | text | receivers |
-| receivers_cc | text | carbon copy list |
-| create_time | datetime | create time |
-| timeout | int | timeout |
-| tenant_id | int | tenant id |
-| update_time | datetime | update time |
-| modify_by | varchar | define user modify the process |
-| resource_ids | varchar | resource id set |
-
-### t_ds_process_instance
-
-| Field | Type | Comment |
-| --- | --- | --- |
-| id | int | primary key |
-| name | varchar | process instance name |
-| process_definition_id | int | process definition id |
-| state | tinyint | process instance Status: 0 successful commit, 1 running, 2 prepare to pause, 3 pause, 4 prepare to stop, 5 stop, 6 fail, 7 succeed, 8 need fault tolerance, 9 kill, 10 wait for thread, 11 wait for dependency to complete |
-| recovery | tinyint | process instance failover flag:0: normal,1: failover instance needs restart |
-| start_time | datetime | process instance start time |
-| end_time | datetime | process instance end time |
-| run_times | int | process instance run times |
-| host | varchar | process instance host |
-| command_type | tinyint | command type:0 start ,1 start from the current node,2 resume a fault-tolerant process,3 resume from pause process, 4 execute from the failed node,5 complement, 6 dispatch, 7 re-run, 8 pause, 9 stop, 10 resume waiting thread |
-| command_param | text | JSON command parameters |
-| task_depend_type | tinyint | node dependency type: 0 current node, 1 forward, 2 backward |
-| max_try_times | tinyint | max try times |
-| failure_strategy | tinyint | failure strategy, 0: end the process when node failed,1: continue run the other nodes when failed |
-| warning_type | tinyint | warning type 0: no warning, 1: warning if process success, 2: warning if process failed, 3: warning whatever results |
-| warning_group_id | int | warning group id |
-| schedule_time | datetime | schedule time |
-| command_start_time | datetime | command start time |
-| global_params | text | global parameters |
-| process_instance_json | longtext | process instance JSON |
-| flag | tinyint | whether process instance is available: 0 not available, 1 available |
-| update_time | timestamp | update time |
-| is_sub_process | int | whether the process is sub process: 1 sub-process, 0 not sub-process |
-| executor_id | int | executor id |
-| locations | text | node location information |
-| connects | text | node connection information |
-| history_cmd | text | history commands, record all the commands to a instance |
-| dependence_schedule_times | text | depend schedule estimate time |
-| process_instance_priority | int | process instance priority. 0 highest,1 high,2 medium,3 low,4 lowest |
-| worker_group | varchar | worker group who assign the task |
-| timeout | int | timeout |
-| tenant_id | int | tenant id |
-
-### t_ds_task_instance
-
-| Field | Type | Comment |
-| --- | --- | --- |
-| id | int | primary key |
-| name | varchar | task name |
-| task_type | varchar | task type |
-| process_definition_id | int | process definition id |
-| process_instance_id | int | process instance id |
-| task_json | longtext | task content JSON |
-| state | tinyint | Status: 0 commit succeeded, 1 running, 2 prepare to pause, 3 pause, 4 prepare to stop, 5 stop, 6 fail, 7 succeed, 8 need fault tolerance, 9 kill, 10 wait for thread, 11 wait for dependency to complete |
-| submit_time | datetime | task submit time |
-| start_time | datetime | task start time |
-| end_time | datetime | task end time |
-| host | varchar | host of task running on |
-| execute_path | varchar | task execute path in the host |
-| log_path | varchar | task log path |
-| alert_flag | tinyint | whether alert |
-| retry_times | int | task retry times |
-| pid | int | pid of task |
-| app_link | varchar | Yarn app id |
-| flag | tinyint | task instance is available : 0 not available, 1 available |
-| retry_interval | int | retry interval when task failed |
-| max_retry_times | int | max retry times |
-| task_instance_priority | int | task instance priority:0 highest,1 high,2 medium,3 low,4 lowest |
-| worker_group | varchar | worker group who assign the task |
+- User can have multiple projects, user project authorization completes the relationship binding using project_id and user_id in `t_ds_relation_project_user` table.
+- The user_id in the `t_ds_projcet` table represents the user who create the project, and the user_id in the `t_ds_relation_project_user` table represents users who have permission to the project.
+- The user_id in the `t_ds_resources` table represents the user who create the resource, and the user_id in `t_ds_relation_resources_user` represents the user who has permissions to the resource.
+- The user_id in the `t_ds_udfs` table represents the user who create the UDF, and the user_id in the `t_ds_relation_udfs_user` table represents a user who has permission to the UDF.
 
-#### t_ds_schedules
+### Project - Tenant - ProcessDefinition - Schedule
+![image.png](../../../img/metadata-erd/project_tenant_process_definition_schedule.png)
 
-| Field | Type | Comment |
-| --- | --- | --- |
-| id | int | primary key |
-| process_definition_id | int | process definition id |
-| start_time | datetime | schedule start time |
-| end_time | datetime | schedule end time |
-| crontab | varchar | crontab expression |
-| failure_strategy | tinyint | failure strategy: 0 end,1 continue |
-| user_id | int | user id |
-| release_state | tinyint | release status: 0 not yet released,1 released |
-| warning_type | tinyint | warning type: 0: no warning, 1: warning if process success, 2: warning if process failed, 3: warning whatever results |
-| warning_group_id | int | warning group id |
-| process_instance_priority | int | process instance priority:0 highest,1 high,2 medium,3 low,4 lowest |
-| worker_group | varchar | worker group who assign the task |
-| create_time | datetime | create time |
-| update_time | datetime | update time |
+- A project can have multiple process definitions, and each process definition belongs to only one project.
+- A tenant can be used by multiple process definitions, and each process definition must select only one tenant.
+- A workflow definition can have one or more schedules.
 
-### t_ds_command
+### Process Definition Execution
+![image.png](../../../img/metadata-erd/process_definition.png)
 
-| Field | Type | Comment |
-| --- | --- | --- |
-| id | int | primary key |
-| command_type | tinyint | command type: 0 start workflow, 1 start execution from current node, 2 resume fault-tolerant workflow, 3 resume pause process, 4 start execution from failed node, 5 complement, 6 schedule, 7 re-run, 8 pause, 9 stop, 10 resume waiting thread |
-| process_definition_id | int | process definition id |
-| command_param | text | JSON command parameters |
-| task_depend_type | tinyint | node dependency type: 0 current node, 1 forward, 2 backward |
-| failure_strategy | tinyint | failed policy: 0 end, 1 continue |
-| warning_type | tinyint | alarm type: 0 no alarm, 1 alarm if process success, 2: alarm if process failed, 3: warning whatever results |
-| warning_group_id | int | warning group id |
-| schedule_time | datetime | schedule time |
-| start_time | datetime | start time |
-| executor_id | int | executor id |
-| dependence | varchar | dependence column |
-| update_time | datetime | update time |
-| process_instance_priority | int | process instance priority: 0 highest,1 high,2 medium,3 low,4 lowest |
-| worker_group_id | int |  worker group who assign the task |
\ No newline at end of file
+- A process definition corresponds to multiple task definitions, which are associated through `t_ds_process_task_relation` and the associated key is code + version. When the pre-task of the task is empty, the corresponding pre_task_node and pre_task_version are 0.

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] zhongjiajie commented on pull request #10600: [doc] Update metadata and design doc

Posted by GitBox <gi...@apache.org>.
zhongjiajie commented on PR #10600:
URL: https://github.com/apache/dolphinscheduler/pull/10600#issuecomment-1167147558

   > > `Docs / Image Check (pull_request) Failing after 5s — Image Check`
   > > does this have any impact
   > 
   > I have the same question, PTAL @zhongjiajie
   
   * difference `img` imgs to `docs` is: {'/img/metadata-erd/process-task.png', '/img/metadata-erd/command.png', '/img/distributed_lock_procss.png'}
   
   mean you should remove those imgs becasue is not need


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] Tianqi-Dotes commented on a diff in pull request #10600: [doc] Update metadata and design doc

Posted by GitBox <gi...@apache.org>.
Tianqi-Dotes commented on code in PR #10600:
URL: https://github.com/apache/dolphinscheduler/pull/10600#discussion_r906913784


##########
docs/docs/en/architecture/design.md:
##########
@@ -29,14 +29,20 @@
     MasterServer provides monitoring services based on netty.
 
     #### The Service Mainly Includes:
+  
+    - **Distributed Quartz** distributed scheduling component, which is mainly responsible for the start and stop operations of scheduled tasks. When quartz start the task, there will be a thread pool inside the Master responsible for the follow-up operation of the processing task;
+
+    - **MasterSchedulerService** is a scanning thread that regularly scans the `t_ds_command` table in the database, runs different business operations according to different **command types**;
+
+    - **WorkflowExecuteRunnable** is mainly responsible for DAG task segmentation, task submission monitoring, and logical processing of different event types;
 
-    - **Distributed Quartz** distributed scheduling component, which is mainly responsible for the start and stop operations of schedule tasks. When Quartz starts the task, there will be a thread pool inside the Master responsible for the follow-up operation of the processing task.
+    - **TaskExecuteRunnable** is mainly responsible for the processing and persistence of tasks, and generates task events and submits them to the event queue of the process instance;
 
-    - **MasterSchedulerThread** is a scanning thread that regularly scans the **command** table in the database and runs different business operations according to different **command types**.
+    - **EventExecuteService** is mainly responsible for the polling of the event queue of the process instance;
 
-    - **MasterExecThread** is mainly responsible for DAG task segmentation, task submission monitoring, and logical processing to different command types.
+    - **StateWheelExecuteThread** is mainly responsible for process instance and task timeout, task retry, task-dependent polling, and generates the corresponding process instance or task event and submits it to the event queue of the process instance;

Review Comment:
   is mainly responsible for process instance
   ->
   is mainly responsible for instance processing



##########
docs/docs/en/architecture/design.md:
##########
@@ -29,14 +29,20 @@
     MasterServer provides monitoring services based on netty.
 
     #### The Service Mainly Includes:
+  
+    - **Distributed Quartz** distributed scheduling component, which is mainly responsible for the start and stop operations of scheduled tasks. When quartz start the task, there will be a thread pool inside the Master responsible for the follow-up operation of the processing task;

Review Comment:
   consider to split words or assemble em' in accordance.
   pick a side 
   Distributed Quartz
   or 
   DistributedQuartz
   same problem in cn part



##########
docs/docs/zh/architecture/design.md:
##########
@@ -28,49 +28,55 @@
 
     ##### 该服务内主要包含:
 
-    - **Distributed Quartz**分布式调度组件,主要负责定时任务的启停操作,当quartz调起任务后,Master内部会有线程池具体负责处理任务的后续操作
+    - **Distributed Quartz**分布式调度组件,主要负责定时任务的启停操作,当quartz调起任务后,Master内部会有线程池具体负责处理任务的后续操作;

Review Comment:
   **Distributed Quartz**
   same as above



##########
docs/docs/en/architecture/design.md:
##########
@@ -46,27 +52,26 @@
      Server provides monitoring services based on netty.
   
      #### The Service Mainly Includes:
-  
-     - **Fetch TaskThread** is mainly responsible for continuously getting tasks from the **Task Queue**, and calling **TaskScheduleThread** corresponding executor according to different task types.
+
+    - **WorkerManagerThread** is mainly responsible for the submission of the task queue, continuously receives tasks from the task queue, and submits them to the thread pool for processing;
+
+    - **TaskExecuteThread** is mainly responsible for the process of task execution, and the actual processing of tasks according to different task types;
+
+    - **RetryReportTaskStatusThread** is mainly responsible for regularly polling to report the status of the task to the Master until the Master replies to the status ack to avoid the loss of the task status;

Review Comment:
    to report the status of the task
   ->
    to report the task status



##########
docs/docs/en/architecture/metadata.md:
##########
@@ -39,155 +13,28 @@
 ![image.png](../../../img/metadata-erd/user-queue-datasource.png)
 
 - One tenant can own Multiple users.
-- The queue field in the t_ds_user table stores the queue_name information in the t_ds_queue table, t_ds_tenant stores queue information using queue_id column. During the execution of the process definition, the user queue has the highest priority. If the user queue is null, use the tenant queue.
-- The user_id field in the t_ds_datasource table shows the user who create the data source. The user_id in t_ds_relation_datasource_user shows the user who has permission to the data source.
+- The queue field in the `t_ds_user` table stores the queue_name information in the `t_ds_queue` table, `t_ds_tenant` stores queue information using queue_id column. During the execution of the process definition, the user queue has the highest priority. If the user queue is null, use the tenant queue.

Review Comment:
   suggest to add `` to column like:
   `queue_name` 
   `queue_id`
   



##########
docs/docs/en/architecture/metadata.md:
##########
@@ -39,155 +13,28 @@
 ![image.png](../../../img/metadata-erd/user-queue-datasource.png)
 
 - One tenant can own Multiple users.
-- The queue field in the t_ds_user table stores the queue_name information in the t_ds_queue table, t_ds_tenant stores queue information using queue_id column. During the execution of the process definition, the user queue has the highest priority. If the user queue is null, use the tenant queue.
-- The user_id field in the t_ds_datasource table shows the user who create the data source. The user_id in t_ds_relation_datasource_user shows the user who has permission to the data source.
+- The queue field in the `t_ds_user` table stores the queue_name information in the `t_ds_queue` table, `t_ds_tenant` stores queue information using queue_id column. During the execution of the process definition, the user queue has the highest priority. If the user queue is null, use the tenant queue.
+- The user_id field in the `t_ds_datasource` table shows the user who create the data source. The user_id in `t_ds_relation_datasource_user` shows the user who has permission to the data source.
   
 ### Project Resource Alert
 
 ![image.png](../../../img/metadata-erd/project-resource-alert.png)
 
-- User can have multiple projects, user project authorization completes the relationship binding using project_id and user_id in t_ds_relation_project_user table.
-- The user_id in the t_ds_projcet table represents the user who create the project, and the user_id in the t_ds_relation_project_user table represents users who have permission to the project.
-- The user_id in the t_ds_resources table represents the user who create the resource, and the user_id in t_ds_relation_resources_user represents the user who has permissions to the resource.
-- The user_id in the t_ds_udfs table represents the user who create the UDF, and the user_id in the t_ds_relation_udfs_user table represents a user who has permission to the UDF.
-  
-### Command Process Task
-
-![image.png](../../../img/metadata-erd/command.png)<br />![image.png](../../../img/metadata-erd/process-task.png)
-
-- A project has multiple process definitions, a process definition can generate multiple process instances, and a process instance can generate multiple task instances.
-- The t_ds_schedulers table stores the specified time schedule information for process definition.
-- The data stored in the t_ds_relation_process_instance table is used to deal with the sub-processes of a process definition, parent_process_instance_id field represents the id of the main process instance who contains child processes, process_instance_id field represents the id of the sub-process instance, parent_task_instance_id field represents the task instance id of the sub-process node.
-- The process instance table and the task instance table correspond to the t_ds_process_instance table and the t_ds_task_instance table, respectively.
-
----
-
-## Core Table Schema
-
-### t_ds_process_definition
-
-| Field | Type | Comment |
-| --- | --- | --- |
-| id | int | primary key |
-| name | varchar | process definition name |
-| version | int | process definition version |
-| release_state | tinyint | process definition release state:0:offline,1:online |
-| project_id | int | project id |
-| user_id | int | process definition creator id |
-| process_definition_json | longtext | process definition JSON content |
-| description | text | process definition description |
-| global_params | text | global parameters |
-| flag | tinyint | whether process available: 0 not available, 1 available |
-| locations | text | Node location information |
-| connects | text | Node connection information |
-| receivers | text | receivers |
-| receivers_cc | text | carbon copy list |
-| create_time | datetime | create time |
-| timeout | int | timeout |
-| tenant_id | int | tenant id |
-| update_time | datetime | update time |
-| modify_by | varchar | define user modify the process |
-| resource_ids | varchar | resource id set |
-
-### t_ds_process_instance
-
-| Field | Type | Comment |
-| --- | --- | --- |
-| id | int | primary key |
-| name | varchar | process instance name |
-| process_definition_id | int | process definition id |
-| state | tinyint | process instance Status: 0 successful commit, 1 running, 2 prepare to pause, 3 pause, 4 prepare to stop, 5 stop, 6 fail, 7 succeed, 8 need fault tolerance, 9 kill, 10 wait for thread, 11 wait for dependency to complete |
-| recovery | tinyint | process instance failover flag:0: normal,1: failover instance needs restart |
-| start_time | datetime | process instance start time |
-| end_time | datetime | process instance end time |
-| run_times | int | process instance run times |
-| host | varchar | process instance host |
-| command_type | tinyint | command type:0 start ,1 start from the current node,2 resume a fault-tolerant process,3 resume from pause process, 4 execute from the failed node,5 complement, 6 dispatch, 7 re-run, 8 pause, 9 stop, 10 resume waiting thread |
-| command_param | text | JSON command parameters |
-| task_depend_type | tinyint | node dependency type: 0 current node, 1 forward, 2 backward |
-| max_try_times | tinyint | max try times |
-| failure_strategy | tinyint | failure strategy, 0: end the process when node failed,1: continue run the other nodes when failed |
-| warning_type | tinyint | warning type 0: no warning, 1: warning if process success, 2: warning if process failed, 3: warning whatever results |
-| warning_group_id | int | warning group id |
-| schedule_time | datetime | schedule time |
-| command_start_time | datetime | command start time |
-| global_params | text | global parameters |
-| process_instance_json | longtext | process instance JSON |
-| flag | tinyint | whether process instance is available: 0 not available, 1 available |
-| update_time | timestamp | update time |
-| is_sub_process | int | whether the process is sub process: 1 sub-process, 0 not sub-process |
-| executor_id | int | executor id |
-| locations | text | node location information |
-| connects | text | node connection information |
-| history_cmd | text | history commands, record all the commands to a instance |
-| dependence_schedule_times | text | depend schedule estimate time |
-| process_instance_priority | int | process instance priority. 0 highest,1 high,2 medium,3 low,4 lowest |
-| worker_group | varchar | worker group who assign the task |
-| timeout | int | timeout |
-| tenant_id | int | tenant id |
-
-### t_ds_task_instance
-
-| Field | Type | Comment |
-| --- | --- | --- |
-| id | int | primary key |
-| name | varchar | task name |
-| task_type | varchar | task type |
-| process_definition_id | int | process definition id |
-| process_instance_id | int | process instance id |
-| task_json | longtext | task content JSON |
-| state | tinyint | Status: 0 commit succeeded, 1 running, 2 prepare to pause, 3 pause, 4 prepare to stop, 5 stop, 6 fail, 7 succeed, 8 need fault tolerance, 9 kill, 10 wait for thread, 11 wait for dependency to complete |
-| submit_time | datetime | task submit time |
-| start_time | datetime | task start time |
-| end_time | datetime | task end time |
-| host | varchar | host of task running on |
-| execute_path | varchar | task execute path in the host |
-| log_path | varchar | task log path |
-| alert_flag | tinyint | whether alert |
-| retry_times | int | task retry times |
-| pid | int | pid of task |
-| app_link | varchar | Yarn app id |
-| flag | tinyint | task instance is available : 0 not available, 1 available |
-| retry_interval | int | retry interval when task failed |
-| max_retry_times | int | max retry times |
-| task_instance_priority | int | task instance priority:0 highest,1 high,2 medium,3 low,4 lowest |
-| worker_group | varchar | worker group who assign the task |
+- User can have multiple projects, user project authorization completes the relationship binding using project_id and user_id in `t_ds_relation_project_user` table.
+- The user_id in the `t_ds_projcet` table represents the user who create the project, and the user_id in the `t_ds_relation_project_user` table represents users who have permission to the project.
+- The user_id in the `t_ds_resources` table represents the user who create the resource, and the user_id in `t_ds_relation_resources_user` represents the user who has permissions to the resource.

Review Comment:
   same as above



##########
docs/docs/en/architecture/design.md:
##########
@@ -188,11 +158,11 @@ Here we must first distinguish the concepts of task failure retry, process failu
 
 Next to the main point, we divide the task nodes in the workflow into two types.
 
-- One is a business node, which corresponds to an actual script or process command, such as shell node, MR node, Spark node, and dependent node.
+- One is a business task, which corresponds to an actual script or process command, such as Shell task, SQL task, and Spark task.
 
-- Another is a logical node, which does not operate actual script or process command, but only logical processing to the entire process flow, such as sub-process sections.
+- Another is a logical task, which does not operate actual script or process command, but only logical processing to the entire process flow, such as sub-process task, dependent task.
 
-Each **business node** can configure the number of failed retries. When the task node fails, it will automatically retry until it succeeds or exceeds the retry times. **Logical node** failure retry is not supported, but the tasks in the logical node support.
+**business node** can configure the number of failed retries. When the task node fails, it will automatically retry until it succeeds or exceeds the retry times.**Logical node** failure retry is not supported.

Review Comment:
   maybe apace before **Logical node**



##########
docs/docs/zh/architecture/metadata.md:
##########
@@ -1,185 +1,37 @@
-# Dolphin Scheduler 1.3元数据文档
+# DolphinScheduler 元数据文档
 
-<a name="25Ald"></a>
-### 表概览
-| 表名 | 表信息 |
-| :---: | :---: |
-| t_ds_access_token | 访问ds后端的token |
-| t_ds_alert | 告警信息 |
-| t_ds_alertgroup | 告警组 |
-| t_ds_command | 执行命令 |
-| t_ds_datasource | 数据源 |
-| t_ds_error_command | 错误命令 |
-| t_ds_process_definition | 流程定义 |
-| t_ds_process_instance | 流程实例 |
-| t_ds_project | 项目 |
-| t_ds_queue | 队列 |
-| t_ds_relation_datasource_user | 用户关联数据源 |
-| t_ds_relation_process_instance | 子流程 |
-| t_ds_relation_project_user | 用户关联项目 |
-| t_ds_relation_resources_user | 用户关联资源 |
-| t_ds_relation_udfs_user | 用户关联UDF函数 |
-| t_ds_relation_user_alertgroup | 用户关联告警组 |
-| t_ds_resources | 资源文件 |
-| t_ds_schedules | 流程定时调度 |
-| t_ds_session | 用户登录的session |
-| t_ds_task_instance | 任务实例 |
-| t_ds_tenant | 租户 |
-| t_ds_udfs | UDF资源 |
-| t_ds_user | 用户 |
-| t_ds_version | ds版本信息 |
+## 表Schema
+详见`dolphinscheduler/dolphinscheduler-dao/src/main/resources/sql`目录下的sql文件
+
+## E-R图
 
-<a name="VNVGr"></a>
 ### 用户	队列	数据源
 ![image.png](../../../img/metadata-erd/user-queue-datasource.png)
 
 - 一个租户下可以有多个用户<br />
-- t_ds_user中的queue字段存储的是队列表中的queue_name信息,t_ds_tenant下存的是queue_id,在流程定义执行过程中,用户队列优先级最高,用户队列为空则采用租户队列<br />
-- t_ds_datasource表中的user_id字段表示创建该数据源的用户,t_ds_relation_datasource_user中的user_id表示,对数据源有权限的用户<br />
-<a name="HHyGV"></a>
+- `t_ds_user`中的queue字段存储的是队列表中的queue_name信息,`t_ds_tenant`下存的是queue_id,在流程定义执行过程中,用户队列优先级最高,用户队列为空则采用租户队列<br />
+- `t_ds_datasource`表中的user_id字段表示创建该数据源的用户,`t_ds_relation_datasource_user`中的user_id表示对数据源有权限的用户<br />
+
 ### 项目	资源	告警
 ![image.png](../../../img/metadata-erd/project-resource-alert.png)
 
-- 一个用户可以有多个项目,用户项目授权通过t_ds_relation_project_user表完成project_id和user_id的关系绑定<br />
-- t_ds_projcet表中的user_id表示创建该项目的用户,t_ds_relation_project_user表中的user_id表示对项目有权限的用户<br />
-- t_ds_resources表中的user_id表示创建该资源的用户,t_ds_relation_resources_user中的user_id表示对资源有权限的用户<br />
-- t_ds_udfs表中的user_id表示创建该UDF的用户,t_ds_relation_udfs_user表中的user_id表示对UDF有权限的用户<br />
-<a name="Bg2Sn"></a>
-### 命令	流程	任务
-![image.png](../../../img/metadata-erd/command.png)<br />![image.png](../../../img/metadata-erd/process-task.png)
-
-- 一个项目有多个流程定义,一个流程定义可以生成多个流程实例,一个流程实例可以生成多个任务实例<br />
-- t_ds_schedulers表存放流程定义的定时调度信息<br />
-- t_ds_relation_process_instance表存放的数据用于处理流程定义中含有子流程的情况,parent_process_instance_id表示含有子流程的主流程实例id,process_instance_id表示子流程实例的id,parent_task_instance_id表示子流程节点的任务实例id,流程实例表和任务实例表分别对应t_ds_process_instance表和t_ds_task_instance表
-<a name="Pv25P"></a>
-### 核心表Schema
-<a name="32Jzd"></a>
-#### t_ds_process_definition
-| 字段 | 类型 | 注释 |
-| --- | --- | --- |
-| id | int | 主键 |
-| name | varchar | 流程定义名称 |
-| version | int | 流程定义版本 |
-| release_state | tinyint | 流程定义的发布状态:0 未上线  1已上线 |
-| project_id | int | 项目id |
-| user_id | int | 流程定义所属用户id |
-| process_definition_json | longtext | 流程定义json串 |
-| description | text | 流程定义描述 |
-| global_params | text | 全局参数 |
-| flag | tinyint | 流程是否可用:0 不可用,1 可用 |
-| locations | text | 节点坐标信息 |
-| connects | text | 节点连线信息 |
-| receivers | text | 收件人 |
-| receivers_cc | text | 抄送人 |
-| create_time | datetime | 创建时间 |
-| timeout | int | 超时时间 |
-| tenant_id | int | 租户id |
-| update_time | datetime | 更新时间 |
-| modify_by | varchar | 修改用户 |
-| resource_ids | varchar | 资源id集 |
-
-<a name="e6jfz"></a>
-#### t_ds_process_instance
-| 字段 | 类型 | 注释 |
-| --- | --- | --- |
-| id | int | 主键 |
-| name | varchar | 流程实例名称 |
-| process_definition_id | int | 流程定义id |
-| state | tinyint | 流程实例状态:0 提交成功,1 正在运行,2 准备暂停,3 暂停,4 准备停止,5 停止,6 失败,7 成功,8 需要容错,9 kill,10 等待线程,11 等待依赖完成 |
-| recovery | tinyint | 流程实例容错标识:0 正常,1 需要被容错重启 |
-| start_time | datetime | 流程实例开始时间 |
-| end_time | datetime | 流程实例结束时间 |
-| run_times | int | 流程实例运行次数 |
-| host | varchar | 流程实例所在的机器 |
-| command_type | tinyint | 命令类型:0 启动工作流,1 从当前节点开始执行,2 恢复被容错的工作流,3 恢复暂停流程,4 从失败节点开始执行,5 补数,6 调度,7 重跑,8 暂停,9 停止,10 恢复等待线程 |
-| command_param | text | 命令的参数(json格式) |
-| task_depend_type | tinyint | 节点依赖类型:0 当前节点,1 向前执行,2 向后执行 |
-| max_try_times | tinyint | 最大重试次数 |
-| failure_strategy | tinyint | 失败策略 0 失败后结束,1 失败后继续 |
-| warning_type | tinyint | 告警类型:0 不发,1 流程成功发,2 流程失败发,3 成功失败都发 |
-| warning_group_id | int | 告警组id |
-| schedule_time | datetime | 预期运行时间 |
-| command_start_time | datetime | 开始命令时间 |
-| global_params | text | 全局参数(固化流程定义的参数) |
-| process_instance_json | longtext | 流程实例json(copy的流程定义的json) |
-| flag | tinyint | 是否可用,1 可用,0不可用 |
-| update_time | timestamp | 更新时间 |
-| is_sub_process | int | 是否是子工作流 1 是,0 不是 |
-| executor_id | int | 命令执行用户 |
-| locations | text | 节点坐标信息 |
-| connects | text | 节点连线信息 |
-| history_cmd | text | 历史命令,记录所有对流程实例的操作 |
-| dependence_schedule_times | text | 依赖节点的预估时间 |
-| process_instance_priority | int | 流程实例优先级:0 Highest,1 High,2 Medium,3 Low,4 Lowest |
-| worker_group | varchar | 任务指定运行的worker分组 |
-| timeout | int | 超时时间 |
-| tenant_id | int | 租户id |
+- 一个用户可以有多个项目,用户项目授权通过`t_ds_relation_project_user`表完成project_id和user_id的关系绑定<br />
+- `t_ds_projcet`表中的user_id表示创建该项目的用户,`t_ds_relation_project_user`表中的user_id表示对项目有权限的用户<br />
+- `t_ds_resources`表中的user_id表示创建该资源的用户,`t_ds_relation_resources_user`中的user_id表示对资源有权限的用户<br />
+- `t_ds_udfs`表中的user_id表示创建该UDF的用户,`t_ds_relation_udfs_user`表中的user_id表示对UDF有权限的用户<br />
 
-<a name="IvHEc"></a>
-#### t_ds_task_instance
-| 字段 | 类型 | 注释 |
-| --- | --- | --- |
-| id | int | 主键 |
-| name | varchar | 任务名称 |
-| task_type | varchar | 任务类型 |
-| process_definition_id | int | 流程定义id |
-| process_instance_id | int | 流程实例id |
-| task_json | longtext | 任务节点json |
-| state | tinyint | 任务实例状态:0 提交成功,1 正在运行,2 准备暂停,3 暂停,4 准备停止,5 停止,6 失败,7 成功,8 需要容错,9 kill,10 等待线程,11 等待依赖完成 |
-| submit_time | datetime | 任务提交时间 |
-| start_time | datetime | 任务开始时间 |
-| end_time | datetime | 任务结束时间 |
-| host | varchar | 执行任务的机器 |
-| execute_path | varchar | 任务执行路径 |
-| log_path | varchar | 任务日志路径 |
-| alert_flag | tinyint | 是否告警 |
-| retry_times | int | 重试次数 |
-| pid | int | 进程pid |
-| app_link | varchar | yarn app id |
-| flag | tinyint | 是否可用:0 不可用,1 可用 |
-| retry_interval | int | 重试间隔 |
-| max_retry_times | int | 最大重试次数 |
-| task_instance_priority | int | 任务实例优先级:0 Highest,1 High,2 Medium,3 Low,4 Lowest |
-| worker_group | varchar | 任务指定运行的worker分组 |
+### 项目 - 租户 - 工作流定义 - 定时
+![image.png](../../../img/metadata-erd/project_tenant_process_definition_schedule.png)
 
-<a name="pPQkU"></a>
-#### t_ds_schedules
-| 字段 | 类型 | 注释 |
-| --- | --- | --- |
-| id | int | 主键 |
-| process_definition_id | int | 流程定义id |
-| start_time | datetime | 调度开始时间 |
-| end_time | datetime | 调度结束时间 |
-| crontab | varchar | crontab 表达式 |
-| failure_strategy | tinyint | 失败策略: 0 结束,1 继续 |
-| user_id | int | 用户id |
-| release_state | tinyint | 状态:0 未上线,1 上线 |
-| warning_type | tinyint | 告警类型:0 不发,1 流程成功发,2 流程失败发,3 成功失败都发 |
-| warning_group_id | int | 告警组id |
-| process_instance_priority | int | 流程实例优先级:0 Highest,1 High,2 Medium,3 Low,4 Lowest |
-| worker_group | varchar | 任务指定运行的worker分组 |
-| create_time | datetime | 创建时间 |
-| update_time | datetime | 更新时间 |
+- 一个项目可以有多个工作流定义,每个工作流定义只属于一个项目;<br />
+- 一个租户可以被多个工作流定义使用,每个工作流定义必须且只能选择一个租户<br />
+- 一个工作流定义可以有一个或多个定时的配置<br />
 
-<a name="TkQzn"></a>
-#### t_ds_command
-| 字段 | 类型 | 注释 |
-| --- | --- | --- |
-| id | int | 主键 |
-| command_type | tinyint | 命令类型:0 启动工作流,1 从当前节点开始执行,2 恢复被容错的工作流,3 恢复暂停流程,4 从失败节点开始执行,5 补数,6 调度,7 重跑,8 暂停,9 停止,10 恢复等待线程 |
-| process_definition_id | int | 流程定义id |
-| command_param | text | 命令的参数(json格式) |
-| task_depend_type | tinyint | 节点依赖类型:0 当前节点,1 向前执行,2 向后执行 |
-| failure_strategy | tinyint | 失败策略:0结束,1继续 |
-| warning_type | tinyint | 告警类型:0 不发,1 流程成功发,2 流程失败发,3 成功失败都发 |
-| warning_group_id | int | 告警组 |
-| schedule_time | datetime | 预期运行时间 |
-| start_time | datetime | 开始时间 |
-| executor_id | int | 执行用户id |
-| dependence | varchar | 依赖字段 |
-| update_time | datetime | 更新时间 |
-| process_instance_priority | int | 流程实例优先级:0 Highest,1 High,2 Medium,3 Low,4 Lowest |
-| worker_group | varchar | 任务指定运行的worker分组 |
+### 工作流定义和执行
+![image.png](../../../img/metadata-erd/process_definition.png)
 
+- 一个工作流定义对应多个任务定义,通过`t_ds_process_task_relation`进行关联,关联的key是code + version,当任务的前置节点为空时,对应的pre_task_node和pre_task_version为0;
+- 一个工作流定义可以有多个工作流实例`t_ds_process_instance`,一个工作流实例对应一个或多个任务实例`t_ds_task_instance`;
+- `t_ds_relation_process_instance`表存放的数据用于处理流程定义中含有子流程的情况,parent_process_instance_id表示含有子流程的主流程实例id,process_instance_id表示子流程实例的id,parent_task_instance_id表示子流程节点的任务实例id,流程实例表和任务实例表分别对应`t_ds_process_instance`表和`t_ds_task_instance`表

Review Comment:
   end add ';'



##########
docs/docs/en/architecture/metadata.md:
##########
@@ -39,155 +13,28 @@
 ![image.png](../../../img/metadata-erd/user-queue-datasource.png)
 
 - One tenant can own Multiple users.
-- The queue field in the t_ds_user table stores the queue_name information in the t_ds_queue table, t_ds_tenant stores queue information using queue_id column. During the execution of the process definition, the user queue has the highest priority. If the user queue is null, use the tenant queue.
-- The user_id field in the t_ds_datasource table shows the user who create the data source. The user_id in t_ds_relation_datasource_user shows the user who has permission to the data source.
+- The queue field in the `t_ds_user` table stores the queue_name information in the `t_ds_queue` table, `t_ds_tenant` stores queue information using queue_id column. During the execution of the process definition, the user queue has the highest priority. If the user queue is null, use the tenant queue.
+- The user_id field in the `t_ds_datasource` table shows the user who create the data source. The user_id in `t_ds_relation_datasource_user` shows the user who has permission to the data source.
   
 ### Project Resource Alert
 
 ![image.png](../../../img/metadata-erd/project-resource-alert.png)
 
-- User can have multiple projects, user project authorization completes the relationship binding using project_id and user_id in t_ds_relation_project_user table.
-- The user_id in the t_ds_projcet table represents the user who create the project, and the user_id in the t_ds_relation_project_user table represents users who have permission to the project.
-- The user_id in the t_ds_resources table represents the user who create the resource, and the user_id in t_ds_relation_resources_user represents the user who has permissions to the resource.
-- The user_id in the t_ds_udfs table represents the user who create the UDF, and the user_id in the t_ds_relation_udfs_user table represents a user who has permission to the UDF.
-  
-### Command Process Task
-
-![image.png](../../../img/metadata-erd/command.png)<br />![image.png](../../../img/metadata-erd/process-task.png)
-
-- A project has multiple process definitions, a process definition can generate multiple process instances, and a process instance can generate multiple task instances.
-- The t_ds_schedulers table stores the specified time schedule information for process definition.
-- The data stored in the t_ds_relation_process_instance table is used to deal with the sub-processes of a process definition, parent_process_instance_id field represents the id of the main process instance who contains child processes, process_instance_id field represents the id of the sub-process instance, parent_task_instance_id field represents the task instance id of the sub-process node.
-- The process instance table and the task instance table correspond to the t_ds_process_instance table and the t_ds_task_instance table, respectively.
-
----
-
-## Core Table Schema
-
-### t_ds_process_definition
-
-| Field | Type | Comment |
-| --- | --- | --- |
-| id | int | primary key |
-| name | varchar | process definition name |
-| version | int | process definition version |
-| release_state | tinyint | process definition release state:0:offline,1:online |
-| project_id | int | project id |
-| user_id | int | process definition creator id |
-| process_definition_json | longtext | process definition JSON content |
-| description | text | process definition description |
-| global_params | text | global parameters |
-| flag | tinyint | whether process available: 0 not available, 1 available |
-| locations | text | Node location information |
-| connects | text | Node connection information |
-| receivers | text | receivers |
-| receivers_cc | text | carbon copy list |
-| create_time | datetime | create time |
-| timeout | int | timeout |
-| tenant_id | int | tenant id |
-| update_time | datetime | update time |
-| modify_by | varchar | define user modify the process |
-| resource_ids | varchar | resource id set |
-
-### t_ds_process_instance
-
-| Field | Type | Comment |
-| --- | --- | --- |
-| id | int | primary key |
-| name | varchar | process instance name |
-| process_definition_id | int | process definition id |
-| state | tinyint | process instance Status: 0 successful commit, 1 running, 2 prepare to pause, 3 pause, 4 prepare to stop, 5 stop, 6 fail, 7 succeed, 8 need fault tolerance, 9 kill, 10 wait for thread, 11 wait for dependency to complete |
-| recovery | tinyint | process instance failover flag:0: normal,1: failover instance needs restart |
-| start_time | datetime | process instance start time |
-| end_time | datetime | process instance end time |
-| run_times | int | process instance run times |
-| host | varchar | process instance host |
-| command_type | tinyint | command type:0 start ,1 start from the current node,2 resume a fault-tolerant process,3 resume from pause process, 4 execute from the failed node,5 complement, 6 dispatch, 7 re-run, 8 pause, 9 stop, 10 resume waiting thread |
-| command_param | text | JSON command parameters |
-| task_depend_type | tinyint | node dependency type: 0 current node, 1 forward, 2 backward |
-| max_try_times | tinyint | max try times |
-| failure_strategy | tinyint | failure strategy, 0: end the process when node failed,1: continue run the other nodes when failed |
-| warning_type | tinyint | warning type 0: no warning, 1: warning if process success, 2: warning if process failed, 3: warning whatever results |
-| warning_group_id | int | warning group id |
-| schedule_time | datetime | schedule time |
-| command_start_time | datetime | command start time |
-| global_params | text | global parameters |
-| process_instance_json | longtext | process instance JSON |
-| flag | tinyint | whether process instance is available: 0 not available, 1 available |
-| update_time | timestamp | update time |
-| is_sub_process | int | whether the process is sub process: 1 sub-process, 0 not sub-process |
-| executor_id | int | executor id |
-| locations | text | node location information |
-| connects | text | node connection information |
-| history_cmd | text | history commands, record all the commands to a instance |
-| dependence_schedule_times | text | depend schedule estimate time |
-| process_instance_priority | int | process instance priority. 0 highest,1 high,2 medium,3 low,4 lowest |
-| worker_group | varchar | worker group who assign the task |
-| timeout | int | timeout |
-| tenant_id | int | tenant id |
-
-### t_ds_task_instance
-
-| Field | Type | Comment |
-| --- | --- | --- |
-| id | int | primary key |
-| name | varchar | task name |
-| task_type | varchar | task type |
-| process_definition_id | int | process definition id |
-| process_instance_id | int | process instance id |
-| task_json | longtext | task content JSON |
-| state | tinyint | Status: 0 commit succeeded, 1 running, 2 prepare to pause, 3 pause, 4 prepare to stop, 5 stop, 6 fail, 7 succeed, 8 need fault tolerance, 9 kill, 10 wait for thread, 11 wait for dependency to complete |
-| submit_time | datetime | task submit time |
-| start_time | datetime | task start time |
-| end_time | datetime | task end time |
-| host | varchar | host of task running on |
-| execute_path | varchar | task execute path in the host |
-| log_path | varchar | task log path |
-| alert_flag | tinyint | whether alert |
-| retry_times | int | task retry times |
-| pid | int | pid of task |
-| app_link | varchar | Yarn app id |
-| flag | tinyint | task instance is available : 0 not available, 1 available |
-| retry_interval | int | retry interval when task failed |
-| max_retry_times | int | max retry times |
-| task_instance_priority | int | task instance priority:0 highest,1 high,2 medium,3 low,4 lowest |
-| worker_group | varchar | worker group who assign the task |
+- User can have multiple projects, user project authorization completes the relationship binding using project_id and user_id in `t_ds_relation_project_user` table.
+- The user_id in the `t_ds_projcet` table represents the user who create the project, and the user_id in the `t_ds_relation_project_user` table represents users who have permission to the project.

Review Comment:
   `user_id`
   



##########
docs/docs/en/architecture/metadata.md:
##########
@@ -39,155 +13,28 @@
 ![image.png](../../../img/metadata-erd/user-queue-datasource.png)
 
 - One tenant can own Multiple users.
-- The queue field in the t_ds_user table stores the queue_name information in the t_ds_queue table, t_ds_tenant stores queue information using queue_id column. During the execution of the process definition, the user queue has the highest priority. If the user queue is null, use the tenant queue.
-- The user_id field in the t_ds_datasource table shows the user who create the data source. The user_id in t_ds_relation_datasource_user shows the user who has permission to the data source.
+- The queue field in the `t_ds_user` table stores the queue_name information in the `t_ds_queue` table, `t_ds_tenant` stores queue information using queue_id column. During the execution of the process definition, the user queue has the highest priority. If the user queue is null, use the tenant queue.
+- The user_id field in the `t_ds_datasource` table shows the user who create the data source. The user_id in `t_ds_relation_datasource_user` shows the user who has permission to the data source.
   
 ### Project Resource Alert
 
 ![image.png](../../../img/metadata-erd/project-resource-alert.png)
 
-- User can have multiple projects, user project authorization completes the relationship binding using project_id and user_id in t_ds_relation_project_user table.
-- The user_id in the t_ds_projcet table represents the user who create the project, and the user_id in the t_ds_relation_project_user table represents users who have permission to the project.
-- The user_id in the t_ds_resources table represents the user who create the resource, and the user_id in t_ds_relation_resources_user represents the user who has permissions to the resource.
-- The user_id in the t_ds_udfs table represents the user who create the UDF, and the user_id in the t_ds_relation_udfs_user table represents a user who has permission to the UDF.
-  
-### Command Process Task
-
-![image.png](../../../img/metadata-erd/command.png)<br />![image.png](../../../img/metadata-erd/process-task.png)
-
-- A project has multiple process definitions, a process definition can generate multiple process instances, and a process instance can generate multiple task instances.
-- The t_ds_schedulers table stores the specified time schedule information for process definition.
-- The data stored in the t_ds_relation_process_instance table is used to deal with the sub-processes of a process definition, parent_process_instance_id field represents the id of the main process instance who contains child processes, process_instance_id field represents the id of the sub-process instance, parent_task_instance_id field represents the task instance id of the sub-process node.
-- The process instance table and the task instance table correspond to the t_ds_process_instance table and the t_ds_task_instance table, respectively.
-
----
-
-## Core Table Schema
-
-### t_ds_process_definition
-
-| Field | Type | Comment |
-| --- | --- | --- |
-| id | int | primary key |
-| name | varchar | process definition name |
-| version | int | process definition version |
-| release_state | tinyint | process definition release state:0:offline,1:online |
-| project_id | int | project id |
-| user_id | int | process definition creator id |
-| process_definition_json | longtext | process definition JSON content |
-| description | text | process definition description |
-| global_params | text | global parameters |
-| flag | tinyint | whether process available: 0 not available, 1 available |
-| locations | text | Node location information |
-| connects | text | Node connection information |
-| receivers | text | receivers |
-| receivers_cc | text | carbon copy list |
-| create_time | datetime | create time |
-| timeout | int | timeout |
-| tenant_id | int | tenant id |
-| update_time | datetime | update time |
-| modify_by | varchar | define user modify the process |
-| resource_ids | varchar | resource id set |
-
-### t_ds_process_instance
-
-| Field | Type | Comment |
-| --- | --- | --- |
-| id | int | primary key |
-| name | varchar | process instance name |
-| process_definition_id | int | process definition id |
-| state | tinyint | process instance Status: 0 successful commit, 1 running, 2 prepare to pause, 3 pause, 4 prepare to stop, 5 stop, 6 fail, 7 succeed, 8 need fault tolerance, 9 kill, 10 wait for thread, 11 wait for dependency to complete |
-| recovery | tinyint | process instance failover flag:0: normal,1: failover instance needs restart |
-| start_time | datetime | process instance start time |
-| end_time | datetime | process instance end time |
-| run_times | int | process instance run times |
-| host | varchar | process instance host |
-| command_type | tinyint | command type:0 start ,1 start from the current node,2 resume a fault-tolerant process,3 resume from pause process, 4 execute from the failed node,5 complement, 6 dispatch, 7 re-run, 8 pause, 9 stop, 10 resume waiting thread |
-| command_param | text | JSON command parameters |
-| task_depend_type | tinyint | node dependency type: 0 current node, 1 forward, 2 backward |
-| max_try_times | tinyint | max try times |
-| failure_strategy | tinyint | failure strategy, 0: end the process when node failed,1: continue run the other nodes when failed |
-| warning_type | tinyint | warning type 0: no warning, 1: warning if process success, 2: warning if process failed, 3: warning whatever results |
-| warning_group_id | int | warning group id |
-| schedule_time | datetime | schedule time |
-| command_start_time | datetime | command start time |
-| global_params | text | global parameters |
-| process_instance_json | longtext | process instance JSON |
-| flag | tinyint | whether process instance is available: 0 not available, 1 available |
-| update_time | timestamp | update time |
-| is_sub_process | int | whether the process is sub process: 1 sub-process, 0 not sub-process |
-| executor_id | int | executor id |
-| locations | text | node location information |
-| connects | text | node connection information |
-| history_cmd | text | history commands, record all the commands to a instance |
-| dependence_schedule_times | text | depend schedule estimate time |
-| process_instance_priority | int | process instance priority. 0 highest,1 high,2 medium,3 low,4 lowest |
-| worker_group | varchar | worker group who assign the task |
-| timeout | int | timeout |
-| tenant_id | int | tenant id |
-
-### t_ds_task_instance
-
-| Field | Type | Comment |
-| --- | --- | --- |
-| id | int | primary key |
-| name | varchar | task name |
-| task_type | varchar | task type |
-| process_definition_id | int | process definition id |
-| process_instance_id | int | process instance id |
-| task_json | longtext | task content JSON |
-| state | tinyint | Status: 0 commit succeeded, 1 running, 2 prepare to pause, 3 pause, 4 prepare to stop, 5 stop, 6 fail, 7 succeed, 8 need fault tolerance, 9 kill, 10 wait for thread, 11 wait for dependency to complete |
-| submit_time | datetime | task submit time |
-| start_time | datetime | task start time |
-| end_time | datetime | task end time |
-| host | varchar | host of task running on |
-| execute_path | varchar | task execute path in the host |
-| log_path | varchar | task log path |
-| alert_flag | tinyint | whether alert |
-| retry_times | int | task retry times |
-| pid | int | pid of task |
-| app_link | varchar | Yarn app id |
-| flag | tinyint | task instance is available : 0 not available, 1 available |
-| retry_interval | int | retry interval when task failed |
-| max_retry_times | int | max retry times |
-| task_instance_priority | int | task instance priority:0 highest,1 high,2 medium,3 low,4 lowest |
-| worker_group | varchar | worker group who assign the task |
+- User can have multiple projects, user project authorization completes the relationship binding using project_id and user_id in `t_ds_relation_project_user` table.
+- The user_id in the `t_ds_projcet` table represents the user who create the project, and the user_id in the `t_ds_relation_project_user` table represents users who have permission to the project.
+- The user_id in the `t_ds_resources` table represents the user who create the resource, and the user_id in `t_ds_relation_resources_user` represents the user who has permissions to the resource.
+- The user_id in the `t_ds_udfs` table represents the user who create the UDF, and the user_id in the `t_ds_relation_udfs_user` table represents a user who has permission to the UDF.

Review Comment:
   same as above



##########
docs/docs/en/architecture/design.md:
##########
@@ -29,14 +29,20 @@
     MasterServer provides monitoring services based on netty.
 
     #### The Service Mainly Includes:
+  
+    - **Distributed Quartz** distributed scheduling component, which is mainly responsible for the start and stop operations of scheduled tasks. When quartz start the task, there will be a thread pool inside the Master responsible for the follow-up operation of the processing task;
+
+    - **MasterSchedulerService** is a scanning thread that regularly scans the `t_ds_command` table in the database, runs different business operations according to different **command types**;
+
+    - **WorkflowExecuteRunnable** is mainly responsible for DAG task segmentation, task submission monitoring, and logical processing of different event types;
 
-    - **Distributed Quartz** distributed scheduling component, which is mainly responsible for the start and stop operations of schedule tasks. When Quartz starts the task, there will be a thread pool inside the Master responsible for the follow-up operation of the processing task.
+    - **TaskExecuteRunnable** is mainly responsible for the processing and persistence of tasks, and generates task events and submits them to the event queue of the process instance;
 
-    - **MasterSchedulerThread** is a scanning thread that regularly scans the **command** table in the database and runs different business operations according to different **command types**.
+    - **EventExecuteService** is mainly responsible for the polling of the event queue of the process instance;

Review Comment:
   process instance
   ->
   process instances



##########
docs/docs/en/architecture/metadata.md:
##########
@@ -39,155 +13,28 @@
 ![image.png](../../../img/metadata-erd/user-queue-datasource.png)
 
 - One tenant can own Multiple users.
-- The queue field in the t_ds_user table stores the queue_name information in the t_ds_queue table, t_ds_tenant stores queue information using queue_id column. During the execution of the process definition, the user queue has the highest priority. If the user queue is null, use the tenant queue.
-- The user_id field in the t_ds_datasource table shows the user who create the data source. The user_id in t_ds_relation_datasource_user shows the user who has permission to the data source.
+- The queue field in the `t_ds_user` table stores the queue_name information in the `t_ds_queue` table, `t_ds_tenant` stores queue information using queue_id column. During the execution of the process definition, the user queue has the highest priority. If the user queue is null, use the tenant queue.
+- The user_id field in the `t_ds_datasource` table shows the user who create the data source. The user_id in `t_ds_relation_datasource_user` shows the user who has permission to the data source.
   
 ### Project Resource Alert
 
 ![image.png](../../../img/metadata-erd/project-resource-alert.png)
 
-- User can have multiple projects, user project authorization completes the relationship binding using project_id and user_id in t_ds_relation_project_user table.
-- The user_id in the t_ds_projcet table represents the user who create the project, and the user_id in the t_ds_relation_project_user table represents users who have permission to the project.
-- The user_id in the t_ds_resources table represents the user who create the resource, and the user_id in t_ds_relation_resources_user represents the user who has permissions to the resource.
-- The user_id in the t_ds_udfs table represents the user who create the UDF, and the user_id in the t_ds_relation_udfs_user table represents a user who has permission to the UDF.
-  
-### Command Process Task
-
-![image.png](../../../img/metadata-erd/command.png)<br />![image.png](../../../img/metadata-erd/process-task.png)
-
-- A project has multiple process definitions, a process definition can generate multiple process instances, and a process instance can generate multiple task instances.
-- The t_ds_schedulers table stores the specified time schedule information for process definition.
-- The data stored in the t_ds_relation_process_instance table is used to deal with the sub-processes of a process definition, parent_process_instance_id field represents the id of the main process instance who contains child processes, process_instance_id field represents the id of the sub-process instance, parent_task_instance_id field represents the task instance id of the sub-process node.
-- The process instance table and the task instance table correspond to the t_ds_process_instance table and the t_ds_task_instance table, respectively.
-
----
-
-## Core Table Schema
-
-### t_ds_process_definition
-
-| Field | Type | Comment |
-| --- | --- | --- |
-| id | int | primary key |
-| name | varchar | process definition name |
-| version | int | process definition version |
-| release_state | tinyint | process definition release state:0:offline,1:online |
-| project_id | int | project id |
-| user_id | int | process definition creator id |
-| process_definition_json | longtext | process definition JSON content |
-| description | text | process definition description |
-| global_params | text | global parameters |
-| flag | tinyint | whether process available: 0 not available, 1 available |
-| locations | text | Node location information |
-| connects | text | Node connection information |
-| receivers | text | receivers |
-| receivers_cc | text | carbon copy list |
-| create_time | datetime | create time |
-| timeout | int | timeout |
-| tenant_id | int | tenant id |
-| update_time | datetime | update time |
-| modify_by | varchar | define user modify the process |
-| resource_ids | varchar | resource id set |
-
-### t_ds_process_instance
-
-| Field | Type | Comment |
-| --- | --- | --- |
-| id | int | primary key |
-| name | varchar | process instance name |
-| process_definition_id | int | process definition id |
-| state | tinyint | process instance Status: 0 successful commit, 1 running, 2 prepare to pause, 3 pause, 4 prepare to stop, 5 stop, 6 fail, 7 succeed, 8 need fault tolerance, 9 kill, 10 wait for thread, 11 wait for dependency to complete |
-| recovery | tinyint | process instance failover flag:0: normal,1: failover instance needs restart |
-| start_time | datetime | process instance start time |
-| end_time | datetime | process instance end time |
-| run_times | int | process instance run times |
-| host | varchar | process instance host |
-| command_type | tinyint | command type:0 start ,1 start from the current node,2 resume a fault-tolerant process,3 resume from pause process, 4 execute from the failed node,5 complement, 6 dispatch, 7 re-run, 8 pause, 9 stop, 10 resume waiting thread |
-| command_param | text | JSON command parameters |
-| task_depend_type | tinyint | node dependency type: 0 current node, 1 forward, 2 backward |
-| max_try_times | tinyint | max try times |
-| failure_strategy | tinyint | failure strategy, 0: end the process when node failed,1: continue run the other nodes when failed |
-| warning_type | tinyint | warning type 0: no warning, 1: warning if process success, 2: warning if process failed, 3: warning whatever results |
-| warning_group_id | int | warning group id |
-| schedule_time | datetime | schedule time |
-| command_start_time | datetime | command start time |
-| global_params | text | global parameters |
-| process_instance_json | longtext | process instance JSON |
-| flag | tinyint | whether process instance is available: 0 not available, 1 available |
-| update_time | timestamp | update time |
-| is_sub_process | int | whether the process is sub process: 1 sub-process, 0 not sub-process |
-| executor_id | int | executor id |
-| locations | text | node location information |
-| connects | text | node connection information |
-| history_cmd | text | history commands, record all the commands to a instance |
-| dependence_schedule_times | text | depend schedule estimate time |
-| process_instance_priority | int | process instance priority. 0 highest,1 high,2 medium,3 low,4 lowest |
-| worker_group | varchar | worker group who assign the task |
-| timeout | int | timeout |
-| tenant_id | int | tenant id |
-
-### t_ds_task_instance
-
-| Field | Type | Comment |
-| --- | --- | --- |
-| id | int | primary key |
-| name | varchar | task name |
-| task_type | varchar | task type |
-| process_definition_id | int | process definition id |
-| process_instance_id | int | process instance id |
-| task_json | longtext | task content JSON |
-| state | tinyint | Status: 0 commit succeeded, 1 running, 2 prepare to pause, 3 pause, 4 prepare to stop, 5 stop, 6 fail, 7 succeed, 8 need fault tolerance, 9 kill, 10 wait for thread, 11 wait for dependency to complete |
-| submit_time | datetime | task submit time |
-| start_time | datetime | task start time |
-| end_time | datetime | task end time |
-| host | varchar | host of task running on |
-| execute_path | varchar | task execute path in the host |
-| log_path | varchar | task log path |
-| alert_flag | tinyint | whether alert |
-| retry_times | int | task retry times |
-| pid | int | pid of task |
-| app_link | varchar | Yarn app id |
-| flag | tinyint | task instance is available : 0 not available, 1 available |
-| retry_interval | int | retry interval when task failed |
-| max_retry_times | int | max retry times |
-| task_instance_priority | int | task instance priority:0 highest,1 high,2 medium,3 low,4 lowest |
-| worker_group | varchar | worker group who assign the task |
+- User can have multiple projects, user project authorization completes the relationship binding using project_id and user_id in `t_ds_relation_project_user` table.

Review Comment:
   `project_id`
   `user_id`



##########
docs/docs/en/architecture/metadata.md:
##########
@@ -39,155 +13,28 @@
 ![image.png](../../../img/metadata-erd/user-queue-datasource.png)
 
 - One tenant can own Multiple users.
-- The queue field in the t_ds_user table stores the queue_name information in the t_ds_queue table, t_ds_tenant stores queue information using queue_id column. During the execution of the process definition, the user queue has the highest priority. If the user queue is null, use the tenant queue.
-- The user_id field in the t_ds_datasource table shows the user who create the data source. The user_id in t_ds_relation_datasource_user shows the user who has permission to the data source.
+- The queue field in the `t_ds_user` table stores the queue_name information in the `t_ds_queue` table, `t_ds_tenant` stores queue information using queue_id column. During the execution of the process definition, the user queue has the highest priority. If the user queue is null, use the tenant queue.
+- The user_id field in the `t_ds_datasource` table shows the user who create the data source. The user_id in `t_ds_relation_datasource_user` shows the user who has permission to the data source.

Review Comment:
   same as above
   `user_id`
   `user_id`



##########
docs/docs/en/architecture/design.md:
##########
@@ -188,11 +158,11 @@ Here we must first distinguish the concepts of task failure retry, process failu
 
 Next to the main point, we divide the task nodes in the workflow into two types.
 
-- One is a business node, which corresponds to an actual script or process command, such as shell node, MR node, Spark node, and dependent node.
+- One is a business task, which corresponds to an actual script or process command, such as Shell task, SQL task, and Spark task.
 
-- Another is a logical node, which does not operate actual script or process command, but only logical processing to the entire process flow, such as sub-process sections.
+- Another is a logical task, which does not operate actual script or process command, but only logical processing to the entire process flow, such as sub-process task, dependent task.
 
-Each **business node** can configure the number of failed retries. When the task node fails, it will automatically retry until it succeeds or exceeds the retry times. **Logical node** failure retry is not supported, but the tasks in the logical node support.
+**business node** can configure the number of failed retries. When the task node fails, it will automatically retry until it succeeds or exceeds the retry times.**Logical node** failure retry is not supported.

Review Comment:
   **business node**
   ->
   **Business node**



##########
docs/docs/en/architecture/metadata.md:
##########
@@ -39,155 +13,28 @@
 ![image.png](../../../img/metadata-erd/user-queue-datasource.png)
 
 - One tenant can own Multiple users.
-- The queue field in the t_ds_user table stores the queue_name information in the t_ds_queue table, t_ds_tenant stores queue information using queue_id column. During the execution of the process definition, the user queue has the highest priority. If the user queue is null, use the tenant queue.
-- The user_id field in the t_ds_datasource table shows the user who create the data source. The user_id in t_ds_relation_datasource_user shows the user who has permission to the data source.
+- The queue field in the `t_ds_user` table stores the queue_name information in the `t_ds_queue` table, `t_ds_tenant` stores queue information using queue_id column. During the execution of the process definition, the user queue has the highest priority. If the user queue is null, use the tenant queue.
+- The user_id field in the `t_ds_datasource` table shows the user who create the data source. The user_id in `t_ds_relation_datasource_user` shows the user who has permission to the data source.
   
 ### Project Resource Alert
 
 ![image.png](../../../img/metadata-erd/project-resource-alert.png)
 
-- User can have multiple projects, user project authorization completes the relationship binding using project_id and user_id in t_ds_relation_project_user table.
-- The user_id in the t_ds_projcet table represents the user who create the project, and the user_id in the t_ds_relation_project_user table represents users who have permission to the project.
-- The user_id in the t_ds_resources table represents the user who create the resource, and the user_id in t_ds_relation_resources_user represents the user who has permissions to the resource.
-- The user_id in the t_ds_udfs table represents the user who create the UDF, and the user_id in the t_ds_relation_udfs_user table represents a user who has permission to the UDF.
-  
-### Command Process Task
-
-![image.png](../../../img/metadata-erd/command.png)<br />![image.png](../../../img/metadata-erd/process-task.png)
-
-- A project has multiple process definitions, a process definition can generate multiple process instances, and a process instance can generate multiple task instances.
-- The t_ds_schedulers table stores the specified time schedule information for process definition.
-- The data stored in the t_ds_relation_process_instance table is used to deal with the sub-processes of a process definition, parent_process_instance_id field represents the id of the main process instance who contains child processes, process_instance_id field represents the id of the sub-process instance, parent_task_instance_id field represents the task instance id of the sub-process node.
-- The process instance table and the task instance table correspond to the t_ds_process_instance table and the t_ds_task_instance table, respectively.
-
----
-
-## Core Table Schema
-
-### t_ds_process_definition
-
-| Field | Type | Comment |
-| --- | --- | --- |
-| id | int | primary key |
-| name | varchar | process definition name |
-| version | int | process definition version |
-| release_state | tinyint | process definition release state:0:offline,1:online |
-| project_id | int | project id |
-| user_id | int | process definition creator id |
-| process_definition_json | longtext | process definition JSON content |
-| description | text | process definition description |
-| global_params | text | global parameters |
-| flag | tinyint | whether process available: 0 not available, 1 available |
-| locations | text | Node location information |
-| connects | text | Node connection information |
-| receivers | text | receivers |
-| receivers_cc | text | carbon copy list |
-| create_time | datetime | create time |
-| timeout | int | timeout |
-| tenant_id | int | tenant id |
-| update_time | datetime | update time |
-| modify_by | varchar | define user modify the process |
-| resource_ids | varchar | resource id set |
-
-### t_ds_process_instance
-
-| Field | Type | Comment |
-| --- | --- | --- |
-| id | int | primary key |
-| name | varchar | process instance name |
-| process_definition_id | int | process definition id |
-| state | tinyint | process instance Status: 0 successful commit, 1 running, 2 prepare to pause, 3 pause, 4 prepare to stop, 5 stop, 6 fail, 7 succeed, 8 need fault tolerance, 9 kill, 10 wait for thread, 11 wait for dependency to complete |
-| recovery | tinyint | process instance failover flag:0: normal,1: failover instance needs restart |
-| start_time | datetime | process instance start time |
-| end_time | datetime | process instance end time |
-| run_times | int | process instance run times |
-| host | varchar | process instance host |
-| command_type | tinyint | command type:0 start ,1 start from the current node,2 resume a fault-tolerant process,3 resume from pause process, 4 execute from the failed node,5 complement, 6 dispatch, 7 re-run, 8 pause, 9 stop, 10 resume waiting thread |
-| command_param | text | JSON command parameters |
-| task_depend_type | tinyint | node dependency type: 0 current node, 1 forward, 2 backward |
-| max_try_times | tinyint | max try times |
-| failure_strategy | tinyint | failure strategy, 0: end the process when node failed,1: continue run the other nodes when failed |
-| warning_type | tinyint | warning type 0: no warning, 1: warning if process success, 2: warning if process failed, 3: warning whatever results |
-| warning_group_id | int | warning group id |
-| schedule_time | datetime | schedule time |
-| command_start_time | datetime | command start time |
-| global_params | text | global parameters |
-| process_instance_json | longtext | process instance JSON |
-| flag | tinyint | whether process instance is available: 0 not available, 1 available |
-| update_time | timestamp | update time |
-| is_sub_process | int | whether the process is sub process: 1 sub-process, 0 not sub-process |
-| executor_id | int | executor id |
-| locations | text | node location information |
-| connects | text | node connection information |
-| history_cmd | text | history commands, record all the commands to a instance |
-| dependence_schedule_times | text | depend schedule estimate time |
-| process_instance_priority | int | process instance priority. 0 highest,1 high,2 medium,3 low,4 lowest |
-| worker_group | varchar | worker group who assign the task |
-| timeout | int | timeout |
-| tenant_id | int | tenant id |
-
-### t_ds_task_instance
-
-| Field | Type | Comment |
-| --- | --- | --- |
-| id | int | primary key |
-| name | varchar | task name |
-| task_type | varchar | task type |
-| process_definition_id | int | process definition id |
-| process_instance_id | int | process instance id |
-| task_json | longtext | task content JSON |
-| state | tinyint | Status: 0 commit succeeded, 1 running, 2 prepare to pause, 3 pause, 4 prepare to stop, 5 stop, 6 fail, 7 succeed, 8 need fault tolerance, 9 kill, 10 wait for thread, 11 wait for dependency to complete |
-| submit_time | datetime | task submit time |
-| start_time | datetime | task start time |
-| end_time | datetime | task end time |
-| host | varchar | host of task running on |
-| execute_path | varchar | task execute path in the host |
-| log_path | varchar | task log path |
-| alert_flag | tinyint | whether alert |
-| retry_times | int | task retry times |
-| pid | int | pid of task |
-| app_link | varchar | Yarn app id |
-| flag | tinyint | task instance is available : 0 not available, 1 available |
-| retry_interval | int | retry interval when task failed |
-| max_retry_times | int | max retry times |
-| task_instance_priority | int | task instance priority:0 highest,1 high,2 medium,3 low,4 lowest |
-| worker_group | varchar | worker group who assign the task |
+- User can have multiple projects, user project authorization completes the relationship binding using project_id and user_id in `t_ds_relation_project_user` table.
+- The user_id in the `t_ds_projcet` table represents the user who create the project, and the user_id in the `t_ds_relation_project_user` table represents users who have permission to the project.
+- The user_id in the `t_ds_resources` table represents the user who create the resource, and the user_id in `t_ds_relation_resources_user` represents the user who has permissions to the resource.
+- The user_id in the `t_ds_udfs` table represents the user who create the UDF, and the user_id in the `t_ds_relation_udfs_user` table represents a user who has permission to the UDF.
 
-#### t_ds_schedules
+### Project - Tenant - ProcessDefinition - Schedule
+![image.png](../../../img/metadata-erd/project_tenant_process_definition_schedule.png)
 
-| Field | Type | Comment |
-| --- | --- | --- |
-| id | int | primary key |
-| process_definition_id | int | process definition id |
-| start_time | datetime | schedule start time |
-| end_time | datetime | schedule end time |
-| crontab | varchar | crontab expression |
-| failure_strategy | tinyint | failure strategy: 0 end,1 continue |
-| user_id | int | user id |
-| release_state | tinyint | release status: 0 not yet released,1 released |
-| warning_type | tinyint | warning type: 0: no warning, 1: warning if process success, 2: warning if process failed, 3: warning whatever results |
-| warning_group_id | int | warning group id |
-| process_instance_priority | int | process instance priority:0 highest,1 high,2 medium,3 low,4 lowest |
-| worker_group | varchar | worker group who assign the task |
-| create_time | datetime | create time |
-| update_time | datetime | update time |
+- A project can have multiple process definitions, and each process definition belongs to only one project.
+- A tenant can be used by multiple process definitions, and each process definition must select only one tenant.
+- A workflow definition can have one or more schedules.
 
-### t_ds_command
+### Process Definition Execution
+![image.png](../../../img/metadata-erd/process_definition.png)
 
-| Field | Type | Comment |
-| --- | --- | --- |
-| id | int | primary key |
-| command_type | tinyint | command type: 0 start workflow, 1 start execution from current node, 2 resume fault-tolerant workflow, 3 resume pause process, 4 start execution from failed node, 5 complement, 6 schedule, 7 re-run, 8 pause, 9 stop, 10 resume waiting thread |
-| process_definition_id | int | process definition id |
-| command_param | text | JSON command parameters |
-| task_depend_type | tinyint | node dependency type: 0 current node, 1 forward, 2 backward |
-| failure_strategy | tinyint | failed policy: 0 end, 1 continue |
-| warning_type | tinyint | alarm type: 0 no alarm, 1 alarm if process success, 2: alarm if process failed, 3: warning whatever results |
-| warning_group_id | int | warning group id |
-| schedule_time | datetime | schedule time |
-| start_time | datetime | start time |
-| executor_id | int | executor id |
-| dependence | varchar | dependence column |
-| update_time | datetime | update time |
-| process_instance_priority | int | process instance priority: 0 highest,1 high,2 medium,3 low,4 lowest |
-| worker_group_id | int |  worker group who assign the task |
\ No newline at end of file
+- A process definition corresponds to multiple task definitions, which are associated through `t_ds_process_task_relation` and the associated key is code + version. When the pre-task of the task is empty, the corresponding pre_task_node and pre_task_version are 0.
+- A process definition can have multiple process instances `t_ds_process_instance`, one process instance corresponds to one or more task instances `t_ds_task_instance`.
+- The data stored in the `t_ds_relation_process_instance` table is used to handle the case that the process definition contains sub-processes. parent_process_instance_id represents the id of the main process instance containing the sub-process, process_instance_id represents the id of the sub-process instance, parent_task_instance_id represents the task instance id of the sub-process node, and the process The instance table and task instance table correspond to the `t_ds_process_instance` table and the `t_ds_task_instance` table respectively

Review Comment:
   `parent_process_instance_id`
   `process_instance_id`
   `parent_task_instance_id`
   respectively->respectively.



##########
docs/docs/en/architecture/metadata.md:
##########
@@ -39,155 +13,28 @@
 ![image.png](../../../img/metadata-erd/user-queue-datasource.png)
 
 - One tenant can own Multiple users.
-- The queue field in the t_ds_user table stores the queue_name information in the t_ds_queue table, t_ds_tenant stores queue information using queue_id column. During the execution of the process definition, the user queue has the highest priority. If the user queue is null, use the tenant queue.
-- The user_id field in the t_ds_datasource table shows the user who create the data source. The user_id in t_ds_relation_datasource_user shows the user who has permission to the data source.
+- The queue field in the `t_ds_user` table stores the queue_name information in the `t_ds_queue` table, `t_ds_tenant` stores queue information using queue_id column. During the execution of the process definition, the user queue has the highest priority. If the user queue is null, use the tenant queue.
+- The user_id field in the `t_ds_datasource` table shows the user who create the data source. The user_id in `t_ds_relation_datasource_user` shows the user who has permission to the data source.
   
 ### Project Resource Alert
 
 ![image.png](../../../img/metadata-erd/project-resource-alert.png)
 
-- User can have multiple projects, user project authorization completes the relationship binding using project_id and user_id in t_ds_relation_project_user table.
-- The user_id in the t_ds_projcet table represents the user who create the project, and the user_id in the t_ds_relation_project_user table represents users who have permission to the project.
-- The user_id in the t_ds_resources table represents the user who create the resource, and the user_id in t_ds_relation_resources_user represents the user who has permissions to the resource.
-- The user_id in the t_ds_udfs table represents the user who create the UDF, and the user_id in the t_ds_relation_udfs_user table represents a user who has permission to the UDF.
-  
-### Command Process Task
-
-![image.png](../../../img/metadata-erd/command.png)<br />![image.png](../../../img/metadata-erd/process-task.png)
-
-- A project has multiple process definitions, a process definition can generate multiple process instances, and a process instance can generate multiple task instances.
-- The t_ds_schedulers table stores the specified time schedule information for process definition.
-- The data stored in the t_ds_relation_process_instance table is used to deal with the sub-processes of a process definition, parent_process_instance_id field represents the id of the main process instance who contains child processes, process_instance_id field represents the id of the sub-process instance, parent_task_instance_id field represents the task instance id of the sub-process node.
-- The process instance table and the task instance table correspond to the t_ds_process_instance table and the t_ds_task_instance table, respectively.
-
----
-
-## Core Table Schema
-
-### t_ds_process_definition
-
-| Field | Type | Comment |
-| --- | --- | --- |
-| id | int | primary key |
-| name | varchar | process definition name |
-| version | int | process definition version |
-| release_state | tinyint | process definition release state:0:offline,1:online |
-| project_id | int | project id |
-| user_id | int | process definition creator id |
-| process_definition_json | longtext | process definition JSON content |
-| description | text | process definition description |
-| global_params | text | global parameters |
-| flag | tinyint | whether process available: 0 not available, 1 available |
-| locations | text | Node location information |
-| connects | text | Node connection information |
-| receivers | text | receivers |
-| receivers_cc | text | carbon copy list |
-| create_time | datetime | create time |
-| timeout | int | timeout |
-| tenant_id | int | tenant id |
-| update_time | datetime | update time |
-| modify_by | varchar | define user modify the process |
-| resource_ids | varchar | resource id set |
-
-### t_ds_process_instance
-
-| Field | Type | Comment |
-| --- | --- | --- |
-| id | int | primary key |
-| name | varchar | process instance name |
-| process_definition_id | int | process definition id |
-| state | tinyint | process instance Status: 0 successful commit, 1 running, 2 prepare to pause, 3 pause, 4 prepare to stop, 5 stop, 6 fail, 7 succeed, 8 need fault tolerance, 9 kill, 10 wait for thread, 11 wait for dependency to complete |
-| recovery | tinyint | process instance failover flag:0: normal,1: failover instance needs restart |
-| start_time | datetime | process instance start time |
-| end_time | datetime | process instance end time |
-| run_times | int | process instance run times |
-| host | varchar | process instance host |
-| command_type | tinyint | command type:0 start ,1 start from the current node,2 resume a fault-tolerant process,3 resume from pause process, 4 execute from the failed node,5 complement, 6 dispatch, 7 re-run, 8 pause, 9 stop, 10 resume waiting thread |
-| command_param | text | JSON command parameters |
-| task_depend_type | tinyint | node dependency type: 0 current node, 1 forward, 2 backward |
-| max_try_times | tinyint | max try times |
-| failure_strategy | tinyint | failure strategy, 0: end the process when node failed,1: continue run the other nodes when failed |
-| warning_type | tinyint | warning type 0: no warning, 1: warning if process success, 2: warning if process failed, 3: warning whatever results |
-| warning_group_id | int | warning group id |
-| schedule_time | datetime | schedule time |
-| command_start_time | datetime | command start time |
-| global_params | text | global parameters |
-| process_instance_json | longtext | process instance JSON |
-| flag | tinyint | whether process instance is available: 0 not available, 1 available |
-| update_time | timestamp | update time |
-| is_sub_process | int | whether the process is sub process: 1 sub-process, 0 not sub-process |
-| executor_id | int | executor id |
-| locations | text | node location information |
-| connects | text | node connection information |
-| history_cmd | text | history commands, record all the commands to a instance |
-| dependence_schedule_times | text | depend schedule estimate time |
-| process_instance_priority | int | process instance priority. 0 highest,1 high,2 medium,3 low,4 lowest |
-| worker_group | varchar | worker group who assign the task |
-| timeout | int | timeout |
-| tenant_id | int | tenant id |
-
-### t_ds_task_instance
-
-| Field | Type | Comment |
-| --- | --- | --- |
-| id | int | primary key |
-| name | varchar | task name |
-| task_type | varchar | task type |
-| process_definition_id | int | process definition id |
-| process_instance_id | int | process instance id |
-| task_json | longtext | task content JSON |
-| state | tinyint | Status: 0 commit succeeded, 1 running, 2 prepare to pause, 3 pause, 4 prepare to stop, 5 stop, 6 fail, 7 succeed, 8 need fault tolerance, 9 kill, 10 wait for thread, 11 wait for dependency to complete |
-| submit_time | datetime | task submit time |
-| start_time | datetime | task start time |
-| end_time | datetime | task end time |
-| host | varchar | host of task running on |
-| execute_path | varchar | task execute path in the host |
-| log_path | varchar | task log path |
-| alert_flag | tinyint | whether alert |
-| retry_times | int | task retry times |
-| pid | int | pid of task |
-| app_link | varchar | Yarn app id |
-| flag | tinyint | task instance is available : 0 not available, 1 available |
-| retry_interval | int | retry interval when task failed |
-| max_retry_times | int | max retry times |
-| task_instance_priority | int | task instance priority:0 highest,1 high,2 medium,3 low,4 lowest |
-| worker_group | varchar | worker group who assign the task |
+- User can have multiple projects, user project authorization completes the relationship binding using project_id and user_id in `t_ds_relation_project_user` table.
+- The user_id in the `t_ds_projcet` table represents the user who create the project, and the user_id in the `t_ds_relation_project_user` table represents users who have permission to the project.
+- The user_id in the `t_ds_resources` table represents the user who create the resource, and the user_id in `t_ds_relation_resources_user` represents the user who has permissions to the resource.
+- The user_id in the `t_ds_udfs` table represents the user who create the UDF, and the user_id in the `t_ds_relation_udfs_user` table represents a user who has permission to the UDF.
 
-#### t_ds_schedules
+### Project - Tenant - ProcessDefinition - Schedule
+![image.png](../../../img/metadata-erd/project_tenant_process_definition_schedule.png)
 
-| Field | Type | Comment |
-| --- | --- | --- |
-| id | int | primary key |
-| process_definition_id | int | process definition id |
-| start_time | datetime | schedule start time |
-| end_time | datetime | schedule end time |
-| crontab | varchar | crontab expression |
-| failure_strategy | tinyint | failure strategy: 0 end,1 continue |
-| user_id | int | user id |
-| release_state | tinyint | release status: 0 not yet released,1 released |
-| warning_type | tinyint | warning type: 0: no warning, 1: warning if process success, 2: warning if process failed, 3: warning whatever results |
-| warning_group_id | int | warning group id |
-| process_instance_priority | int | process instance priority:0 highest,1 high,2 medium,3 low,4 lowest |
-| worker_group | varchar | worker group who assign the task |
-| create_time | datetime | create time |
-| update_time | datetime | update time |
+- A project can have multiple process definitions, and each process definition belongs to only one project.
+- A tenant can be used by multiple process definitions, and each process definition must select only one tenant.
+- A workflow definition can have one or more schedules.
 
-### t_ds_command
+### Process Definition Execution
+![image.png](../../../img/metadata-erd/process_definition.png)
 
-| Field | Type | Comment |
-| --- | --- | --- |
-| id | int | primary key |
-| command_type | tinyint | command type: 0 start workflow, 1 start execution from current node, 2 resume fault-tolerant workflow, 3 resume pause process, 4 start execution from failed node, 5 complement, 6 schedule, 7 re-run, 8 pause, 9 stop, 10 resume waiting thread |
-| process_definition_id | int | process definition id |
-| command_param | text | JSON command parameters |
-| task_depend_type | tinyint | node dependency type: 0 current node, 1 forward, 2 backward |
-| failure_strategy | tinyint | failed policy: 0 end, 1 continue |
-| warning_type | tinyint | alarm type: 0 no alarm, 1 alarm if process success, 2: alarm if process failed, 3: warning whatever results |
-| warning_group_id | int | warning group id |
-| schedule_time | datetime | schedule time |
-| start_time | datetime | start time |
-| executor_id | int | executor id |
-| dependence | varchar | dependence column |
-| update_time | datetime | update time |
-| process_instance_priority | int | process instance priority: 0 highest,1 high,2 medium,3 low,4 lowest |
-| worker_group_id | int |  worker group who assign the task |
\ No newline at end of file
+- A process definition corresponds to multiple task definitions, which are associated through `t_ds_process_task_relation` and the associated key is code + version. When the pre-task of the task is empty, the corresponding pre_task_node and pre_task_version are 0.

Review Comment:
   `code + version`
   `pre_task_node`
   `pre_task_version`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] Tianqi-Dotes commented on pull request #10600: [doc] Update metadata and design doc

Posted by GitBox <gi...@apache.org>.
Tianqi-Dotes commented on PR #10600:
URL: https://github.com/apache/dolphinscheduler/pull/10600#issuecomment-1166943498

   `Docs / Image Check (pull_request) Failing after 5s — Image Check`
   does this have any impact


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] caishunfeng commented on pull request #10600: [doc] Update metadata and design doc

Posted by GitBox <gi...@apache.org>.
caishunfeng commented on PR #10600:
URL: https://github.com/apache/dolphinscheduler/pull/10600#issuecomment-1167033703

   > `Docs / Image Check (pull_request) Failing after 5s — Image Check`
   > does this have any impact
   
   I have the same question, PTAL @zhongjiajie 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] zhongjiajie merged pull request #10600: [doc] Update metadata and design doc

Posted by GitBox <gi...@apache.org>.
zhongjiajie merged PR #10600:
URL: https://github.com/apache/dolphinscheduler/pull/10600


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org