You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@dolphinscheduler.apache.org by Mad <10...@qq.com> on 2020/07/01 13:45:44 UTC

Program discussion for "Task result/variable transfer"

English:
 
Technical solutions:
 
Core:
 
By dynamically modifying the values of several global variables, "task result/variable transfer" between task nodes is performed;
 
Reason:
 
1. Global variables are visible to all task nodes in the task flow.
 
2. The execution process of task instances is serial, so the transfer of task results and variables between task nodes will not occur at the same time, and it is easy to form a mapping relationship with a limited number of global variables that have been allocated.
 
Detail:
 
1. Traverse the task nodes in sequence, and initialize the number of global parameters n=0, when traversing to the upstream node passing the parameter, n+= parameters to be passed, traversing to the downstream node passing the parameter, then n -= parameters to be passed and record the maximum value of n in the process, let N=n_max. Create N global variables. The global variable can be named as a reserved field or a random string, such as {"G1","G2","G3",...,"GN"}, and added to the set U.
 
2. Serially execute the task node task_A and find the downstream task node task_B that needs to pass parameters. If task_A does not need to pass parameters, skip the following steps.
 
3. The M parameters required to be received in task_B and the M global variables retrieved from the set U form a mapping relationship {"Param1":"Gu+1", "Param2":"Gu+2", "Param3": "Gu+3", …., "ParamM":"Gu+M"}, M<=N;
 
4. After task_A is executed, synchronize the parameters to global variables according to the mapping relationship constructed in step 3.
 
6. Before task_B is executed, synchronize the global variables to the parameters according to the mapping relationship constructed in step 3.
 
7. After task_B is executed, add the global variables corresponding to the parameters back to the set U.
 
&nbsp;
 
Example:
 

 
The execution process is shown in the figure above
 

 
The parameter transfer process is shown in the figure above
 
&nbsp;
 
Node A passes the two parameters a and b to node C;
 
Node B transfers the three parameters c, d and e to node C;
 
1. N=5, U={G1, G2, G3, G4, G5};
 
2. Perform the task A node and find node C as the downstream;
 
3. Get the mapping {a:G1, b:G2}; remaining U={G3,G4,G5};
 
4. After A is executed, synchronize a and b to G1 and G2;
 
5. Execute the task B node and find the node C as the downstream;
 
6. Get the mapping {c:G3, d:G4, e:G5}; remaining U={};
 
7. After B is executed, synchronize c, d, e to G3, G4, G5;
 
8. Execute the C task node and synchronize G1, G2, G3, G4, G5 to a, b, c, d, e; U={G1, G2, G3, G4, G5};
 
&nbsp;
 
简体中文：
 
技术方案：
 
核心：
 
通过动态修改若干个全局变量的值，来进行任务节点之间的“任务结果/变量传递”；
 
原因：
 
1、全局变量对任务流程中的所有任务节点可见。
 
2、任务实例执行流程为串行，则任务节点之间的任务结果和变量的传递不会同时发生，可以很容易地和已分配的有限个全局变量构成映射关系。
 
具体实现：
 
1、顺序遍历一次任务节点，并初始化全局参数个数 n=0，当遍历到传递参数的上游节点，则 n+=需要传递参数的个数，遍历到传递参数的下游节点，则 n -=需要传递参数的个数，记录该过程中 n 的最大值，令 N=n_max。创建 N 个全局变量，全局变量的命名可以为保留字段，也可以为随机字符串，例如{"G1","G2","G3",…,"GN"}，并添加到集合U中。
 
2、串行执行任务节点task_A，并找到需要传递参数的下游任务节点task_B，若task_A不需要传递参数，则跳过以下步骤。
 
3、将task_B中所需接收的M个参数与集合U中取出的M个全局变量构成映射关系{"Param1":"Gu+1", "Param2":"Gu+2", "Param3":"Gu+3", …., "ParamM":"Gu+M"}，M<=N；
 
4、task_A执行完后，将参数根据步骤3中构建的映射关系同步至全局变量。
 
6、task_B执行之前，将全局变量根据步骤3中构建的映射关系同步至参数。
 
7、task_B执行完后，将参数对应全局变量重新添加回集合U中。
 
例子：
 

 
执行流程如上图所示
 

 
参数传递过程如上图所示
 
A节点传递a，b两个参数至C节点；
 
B节点传递c，d，e三个参数至C节点；
 
1、N=5，U={G1,G2,G3,G4,G5}；
 
2、执行A任务节点，找到C节点为下游；
 
3、得到映射{a:G1, b:G2}；剩余U={G3,G4,G5}；
 
4、A执行完后，同步a、b至G1、G2；
 
5、执行B任务节点，找到C节点为下游；
 
6、得到映射{c:G3, d:G4, e:G5}；剩余U={}；
 
7、B执行完后，同步c、d、e至G3、G4、G5；
 
8、执行C任务节点，同步G1、G2、G3、G4、G5至a、b、c、d、e；U={G1,G2,G3,G4,G5}；

回复：Program discussion for "Task result/variable transfer"

Posted by Jave-Chen <ke...@foxmail.com>.

Can tasks change&nbsp;parameters ?
e.g.&nbsp;
- default value of G1 is "1"
- task_A change G1 to "2"
- task_B read G1, and the value of G1 is 2 ?




------------------&nbsp;原始邮件&nbsp;------------------
发件人:&nbsp;"Mad"<1094592600@qq.com&gt;;
发送时间:&nbsp;2020年7月1日(星期三) 晚上9:45
收件人:&nbsp;"dev"<dev@dolphinscheduler.apache.org&gt;;

主题:&nbsp;Program discussion for "Task result/variable transfer"






English:
 
Technical solutions:
 
Core:
 
By dynamically modifying the values of several global variables, "task result/variable transfer" between task nodes is performed;
 
Reason:
 
1. Global variables are visible to all task nodes in the task flow.
 
2. The execution process of task instances is serial, so the transfer of task results and variables between task nodes will not occur at the same time, and it is easy to form a mapping relationship with a limited number of global variables that have been allocated.
 
Detail:
 
1. Traverse the task nodes in sequence, and initialize the number of global parameters n=0, when traversing to the upstream node passing the parameter, n+= parameters to be passed, traversing to the downstream node passing the parameter, then n -= parameters to be passed and record the maximum value of n in the process, let N=n_max. Create N global variables. The global variable can be named as a reserved field or a random string, such as {"G1","G2","G3",...,"GN"}, and added to the set U.
 
2. Serially execute the task node task_A and find the downstream task node task_B that needs to pass parameters. If task_A does not need to pass parameters, skip the following steps.
 
3. The M parameters required to be received in task_B and the M global variables retrieved from the set U form a mapping relationship {"Param1":"Gu+1", "Param2":"Gu+2", "Param3": "Gu+3", …., "ParamM":"Gu+M"}, M<=N;
 
4. After task_A is executed, synchronize the parameters to global variables according to the mapping relationship constructed in step 3.
 
6. Before task_B is executed, synchronize the global variables to the parameters according to the mapping relationship constructed in step 3.
 
7. After task_B is executed, add the global variables corresponding to the parameters back to the set U.
 
&nbsp;
 
Example:
 

 
The execution process is shown in the figure above
 

 
The parameter transfer process is shown in the figure above
 
&nbsp;
 
Node A passes the two parameters a and b to node C;
 
Node B transfers the three parameters c, d and e to node C;
 
1. N=5, U={G1, G2, G3, G4, G5};
 
2. Perform the task A node and find node C as the downstream;
 
3. Get the mapping {a:G1, b:G2}; remaining U={G3,G4,G5};
 
4. After A is executed, synchronize a and b to G1 and G2;
 
5. Execute the task B node and find the node C as the downstream;
 
6. Get the mapping {c:G3, d:G4, e:G5}; remaining U={};
 
7. After B is executed, synchronize c, d, e to G3, G4, G5;
 
8. Execute the C task node and synchronize G1, G2, G3, G4, G5 to a, b, c, d, e; U={G1, G2, G3, G4, G5};
 
&nbsp;
 
简体中文：
 
技术方案：
 
核心：
 
通过动态修改若干个全局变量的值，来进行任务节点之间的“任务结果/变量传递”；
 
原因：
 
1、全局变量对任务流程中的所有任务节点可见。
 
2、任务实例执行流程为串行，则任务节点之间的任务结果和变量的传递不会同时发生，可以很容易地和已分配的有限个全局变量构成映射关系。
 
具体实现：
 
1、顺序遍历一次任务节点，并初始化全局参数个数 n=0，当遍历到传递参数的上游节点，则 n+=需要传递参数的个数，遍历到传递参数的下游节点，则 n -=需要传递参数的个数，记录该过程中 n 的最大值，令 N=n_max。创建 N 个全局变量，全局变量的命名可以为保留字段，也可以为随机字符串，例如{"G1","G2","G3",…,"GN"}，并添加到集合U中。
 
2、串行执行任务节点task_A，并找到需要传递参数的下游任务节点task_B，若task_A不需要传递参数，则跳过以下步骤。
 
3、将task_B中所需接收的M个参数与集合U中取出的M个全局变量构成映射关系{"Param1":"Gu+1", "Param2":"Gu+2", "Param3":"Gu+3", …., "ParamM":"Gu+M"}，M<=N；
 
4、task_A执行完后，将参数根据步骤3中构建的映射关系同步至全局变量。
 
6、task_B执行之前，将全局变量根据步骤3中构建的映射关系同步至参数。
 
7、task_B执行完后，将参数对应全局变量重新添加回集合U中。
 
例子：
 

 
执行流程如上图所示
 

 
参数传递过程如上图所示
 
A节点传递a，b两个参数至C节点；
 
B节点传递c，d，e三个参数至C节点；
 
1、N=5，U={G1,G2,G3,G4,G5}；
 
2、执行A任务节点，找到C节点为下游；
 
3、得到映射{a:G1, b:G2}；剩余U={G3,G4,G5}；
 
4、A执行完后，同步a、b至G1、G2；
 
5、执行B任务节点，找到C节点为下游；
 
6、得到映射{c:G3, d:G4, e:G5}；剩余U={}；
 
7、B执行完后，同步c、d、e至G3、G4、G5；
 
8、执行C任务节点，同步G1、G2、G3、G4、G5至a、b、c、d、e；U={G1,G2,G3,G4,G5}；