You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@dolphinscheduler.apache.org by 裴龙武 <pe...@qq.com> on 2020/05/13 07:16:15 UTC

[Feature] Support SSH Task and Support dummy task like airflow

Dear ALL:


Support Linux SSH Task 支持 Linux SSH 任务

场景描述:当前项目中,工作流的任务的目标是执行不同服务器 Shell 脚本,Shell 脚本是保存在业务服务器的固定目录。当 Worker 调度执行时,需要通过固定用户登录这些服务器,然后执行 Shell 脚本并获取这些任务执行的状态,其中服务器地址、用户名、密码可配置。

For example, in my project, the workflow's tasks want to execute shell scripts where are in different server's different directory. When worker execute these shell scripts, it must use the same user to login these server. Also, the worker can get the executing state of these server. We can config these server 's host,user and password.

SSH Task is very useful for most user SSH 任务对大多数用户是非常有用的

分布式调度任务所执行的 Shell 脚本是处于不同的业务服务器,都有其固定的业务,这些业务服务器不是 Worker,只是需要 Worker 调度执行,我们只需要传递不同的参数,让服务器执行任务脚本即可。

In dolphinscheduler, the most executing tasks are in different servers who are not workers. These servers also have their different fixed services. We just have to pass different parameters to schedule these shell scripts to execute.

Python has a module to execute ssh script Python 有固定的工具包,可执行这些SSH Shell 脚本

Python 有一个可执行远程服务器SSH Shell脚本的模块,其名字为:paramiko。

Python has a module that can execute SSH Shell script. It's paramiko.

Others 其他内容

我发现之前的改进功能中也有关于这个的描述,不过相对简单。功能更新地址

I found this described in previous feature, but it was relatively simple.
Feature URL

另外,我通过 Shell Task 方式去执行远程任务会非常不便,下面是我的脚本,不知道是否有更好的方式。

In addition, it is very inconvenient for me to perform remote tasks through Shell Task. Here is my script. I don't know if there's a better way.
sshpass -p 'password' ssh user@host echo 'ssh success' echo 'Hello World' -&gt; /home/dolphinscheduler/test/hello.txt echo 'end'



Support dummy task like airflow 支持像 Airflow 中的虚拟任务

场景描述:项目中,有已经产品化的 DAG 文件,DAG 文件中包括不同的模块,这些模块之间的有些点是相互依赖的,有些不是,在用户购买不同模块时,需要把未购买模块且其他已购模块未依赖的点设置为 Dummy Task,这样实际这些任务就不会执行,这样设置的好处是产品统一性和图的完整性,在AirFlow中,这些是通过DummyOperator完成的。

For example, in my project, it has a productized DAG file. The file contains different modules, some of which are interdependent and some of which are not. When customers purchase different modules, we need to set some tasks as dummy tasks, which some modules are not purchased and the purchased module is not dependent. Because of this setting, these dummy tasks are actually not executed. The benefits of this setup are product unity and diagram integrity. In airflow, these task execute by dummy operator.

** Realize 实现方式**

Dummy Task 本身实现很简单,只是需要与其他任务配合使用,但任务执行方式设置为 dummy 时,实际的任务不执行,执行 Dummy Task。

Dummy Task is easy to realize, but it need to use with other different tasks. When the task's executed type is set to dummy type, the task are executed as a dummy task and the real task is not executed.




顺带说一下,因为项目着急测试使用,我Fork了开发版本,实现两种任务类型。在后续的版本中是否能够支持。

By the way,I already realize these two&nbsp; features in my fork branch.&nbsp;Whether the follow-up release can be supported

回复: [Feature] Support SSH Task and Support dummy task like airflow

Posted by 裴龙武 <pe...@qq.com>.
Could you give me a example,3Q. 能否给我一个例子,谢谢!


By the way, I have more than 100 tasks in one DAG. These tasks connect two other server to execute. So SSH tasks must have pool to manager. Now I use JSch and realize a simple pool.


顺带说一下,在我的实际场景中,我有100多个 SSH 任务,这些任务连接两台任务服务器进行任务执行。所以 SSH 任务进行连接时,必须使用连接池进行管理。当前我使用 JSch,并实现了一个简单的连接池。


------------------&nbsp;原始邮件&nbsp;------------------
发件人:&nbsp;"wenhemin"<whm_777@163.com&gt;;
发送时间:&nbsp;2020年5月13日(星期三) 下午5:24
收件人:&nbsp;"dev"<dev@dolphinscheduler.apache.org&gt;;

主题:&nbsp;Re: [Feature] Support SSH Task and Support dummy task like airflow



The shell node is supports remote calling, and get the remote command result code.


&gt; 在 2020年5月13日,15:16,裴龙武 <peilongwu@qq.com&gt; 写道:
&gt; 
&gt; Dear ALL:
&gt; 
&gt; 
&gt; Support Linux SSH Task 支持 Linux SSH 任务
&gt; 
&gt; 场景描述:当前项目中,工作流的任务的目标是执行不同服务器 Shell 脚本,Shell 脚本是保存在业务服务器的固定目录。当 Worker 调度执行时,需要通过固定用户登录这些服务器,然后执行 Shell 脚本并获取这些任务执行的状态,其中服务器地址、用户名、密码可配置。
&gt; 
&gt; For example, in my project, the workflow's tasks want to execute shell scripts where are in different server's different directory. When worker execute these shell scripts, it must use the same user to login these server. Also, the worker can get the executing state of these server. We can config these server 's host,user and password.
&gt; 
&gt; SSH Task is very useful for most user SSH 任务对大多数用户是非常有用的
&gt; 
&gt; 分布式调度任务所执行的 Shell 脚本是处于不同的业务服务器,都有其固定的业务,这些业务服务器不是 Worker,只是需要 Worker 调度执行,我们只需要传递不同的参数,让服务器执行任务脚本即可。
&gt; 
&gt; In dolphinscheduler, the most executing tasks are in different servers who are not workers. These servers also have their different fixed services. We just have to pass different parameters to schedule these shell scripts to execute.
&gt; 
&gt; Python has a module to execute ssh script Python 有固定的工具包,可执行这些SSH Shell 脚本
&gt; 
&gt; Python 有一个可执行远程服务器SSH Shell脚本的模块,其名字为:paramiko。
&gt; 
&gt; Python has a module that can execute SSH Shell script. It's paramiko.
&gt; 
&gt; Others 其他内容
&gt; 
&gt; 我发现之前的改进功能中也有关于这个的描述,不过相对简单。功能更新地址
&gt; 
&gt; I found this described in previous feature, but it was relatively simple.
&gt; Feature URL
&gt; 
&gt; 另外,我通过 Shell Task 方式去执行远程任务会非常不便,下面是我的脚本,不知道是否有更好的方式。
&gt; 
&gt; In addition, it is very inconvenient for me to perform remote tasks through Shell Task. Here is my script. I don't know if there's a better way.
&gt; sshpass -p 'password' ssh user@host echo 'ssh success' echo 'Hello World' -&amp;gt; /home/dolphinscheduler/test/hello.txt echo 'end'
&gt; 
&gt; 
&gt; 
&gt; Support dummy task like airflow 支持像 Airflow 中的虚拟任务
&gt; 
&gt; 场景描述:项目中,有已经产品化的 DAG 文件,DAG 文件中包括不同的模块,这些模块之间的有些点是相互依赖的,有些不是,在用户购买不同模块时,需要把未购买模块且其他已购模块未依赖的点设置为 Dummy Task,这样实际这些任务就不会执行,这样设置的好处是产品统一性和图的完整性,在AirFlow中,这些是通过DummyOperator完成的。
&gt; 
&gt; For example, in my project, it has a productized DAG file. The file contains different modules, some of which are interdependent and some of which are not. When customers purchase different modules, we need to set some tasks as dummy tasks, which some modules are not purchased and the purchased module is not dependent. Because of this setting, these dummy tasks are actually not executed. The benefits of this setup are product unity and diagram integrity. In airflow, these task execute by dummy operator.
&gt; 
&gt; ** Realize 实现方式**
&gt; 
&gt; Dummy Task 本身实现很简单,只是需要与其他任务配合使用,但任务执行方式设置为 dummy 时,实际的任务不执行,执行 Dummy Task。
&gt; 
&gt; Dummy Task is easy to realize, but it need to use with other different tasks. When the task's executed type is set to dummy type, the task are executed as a dummy task and the real task is not executed.
&gt; 
&gt; 
&gt; 
&gt; 
&gt; 顺带说一下,因为项目着急测试使用,我Fork了开发版本,实现两种任务类型。在后续的版本中是否能够支持。
&gt; 
&gt; By the way,I already realize these two&amp;nbsp; features in my fork branch.&amp;nbsp;Whether the follow-up release can be supported

Re: [Feature] Support SSH Task and Support dummy task like airflow

Posted by wenhemin <wh...@163.com>.
The shell node is supports remote calling, and get the remote command result code.


> 在 2020年5月13日,15:16,裴龙武 <pe...@qq.com> 写道:
> 
> Dear ALL:
> 
> 
> Support Linux SSH Task 支持 Linux SSH 任务
> 
> 场景描述:当前项目中,工作流的任务的目标是执行不同服务器 Shell 脚本,Shell 脚本是保存在业务服务器的固定目录。当 Worker 调度执行时,需要通过固定用户登录这些服务器,然后执行 Shell 脚本并获取这些任务执行的状态,其中服务器地址、用户名、密码可配置。
> 
> For example, in my project, the workflow's tasks want to execute shell scripts where are in different server's different directory. When worker execute these shell scripts, it must use the same user to login these server. Also, the worker can get the executing state of these server. We can config these server 's host,user and password.
> 
> SSH Task is very useful for most user SSH 任务对大多数用户是非常有用的
> 
> 分布式调度任务所执行的 Shell 脚本是处于不同的业务服务器,都有其固定的业务,这些业务服务器不是 Worker,只是需要 Worker 调度执行,我们只需要传递不同的参数,让服务器执行任务脚本即可。
> 
> In dolphinscheduler, the most executing tasks are in different servers who are not workers. These servers also have their different fixed services. We just have to pass different parameters to schedule these shell scripts to execute.
> 
> Python has a module to execute ssh script Python 有固定的工具包,可执行这些SSH Shell 脚本
> 
> Python 有一个可执行远程服务器SSH Shell脚本的模块,其名字为:paramiko。
> 
> Python has a module that can execute SSH Shell script. It's paramiko.
> 
> Others 其他内容
> 
> 我发现之前的改进功能中也有关于这个的描述,不过相对简单。功能更新地址
> 
> I found this described in previous feature, but it was relatively simple.
> Feature URL
> 
> 另外,我通过 Shell Task 方式去执行远程任务会非常不便,下面是我的脚本,不知道是否有更好的方式。
> 
> In addition, it is very inconvenient for me to perform remote tasks through Shell Task. Here is my script. I don't know if there's a better way.
> sshpass -p 'password' ssh user@host echo 'ssh success' echo 'Hello World' -&gt; /home/dolphinscheduler/test/hello.txt echo 'end'
> 
> 
> 
> Support dummy task like airflow 支持像 Airflow 中的虚拟任务
> 
> 场景描述:项目中,有已经产品化的 DAG 文件,DAG 文件中包括不同的模块,这些模块之间的有些点是相互依赖的,有些不是,在用户购买不同模块时,需要把未购买模块且其他已购模块未依赖的点设置为 Dummy Task,这样实际这些任务就不会执行,这样设置的好处是产品统一性和图的完整性,在AirFlow中,这些是通过DummyOperator完成的。
> 
> For example, in my project, it has a productized DAG file. The file contains different modules, some of which are interdependent and some of which are not. When customers purchase different modules, we need to set some tasks as dummy tasks, which some modules are not purchased and the purchased module is not dependent. Because of this setting, these dummy tasks are actually not executed. The benefits of this setup are product unity and diagram integrity. In airflow, these task execute by dummy operator.
> 
> ** Realize 实现方式**
> 
> Dummy Task 本身实现很简单,只是需要与其他任务配合使用,但任务执行方式设置为 dummy 时,实际的任务不执行,执行 Dummy Task。
> 
> Dummy Task is easy to realize, but it need to use with other different tasks. When the task's executed type is set to dummy type, the task are executed as a dummy task and the real task is not executed.
> 
> 
> 
> 
> 顺带说一下,因为项目着急测试使用,我Fork了开发版本,实现两种任务类型。在后续的版本中是否能够支持。
> 
> By the way,I already realize these two&nbsp; features in my fork branch.&nbsp;Whether the follow-up release can be supported


Re: [Feature] Support SSH Task and Support dummy task like airflow

Posted by wenhemin <wh...@163.com>.
The shell node is supports remote calling, and get the remote command result code.


> 在 2020年5月13日,15:16,裴龙武 <pe...@qq.com> 写道:
> 
> Dear ALL:
> 
> 
> Support Linux SSH Task 支持 Linux SSH 任务
> 
> 场景描述:当前项目中,工作流的任务的目标是执行不同服务器 Shell 脚本,Shell 脚本是保存在业务服务器的固定目录。当 Worker 调度执行时,需要通过固定用户登录这些服务器,然后执行 Shell 脚本并获取这些任务执行的状态,其中服务器地址、用户名、密码可配置。
> 
> For example, in my project, the workflow's tasks want to execute shell scripts where are in different server's different directory. When worker execute these shell scripts, it must use the same user to login these server. Also, the worker can get the executing state of these server. We can config these server 's host,user and password.
> 
> SSH Task is very useful for most user SSH 任务对大多数用户是非常有用的
> 
> 分布式调度任务所执行的 Shell 脚本是处于不同的业务服务器,都有其固定的业务,这些业务服务器不是 Worker,只是需要 Worker 调度执行,我们只需要传递不同的参数,让服务器执行任务脚本即可。
> 
> In dolphinscheduler, the most executing tasks are in different servers who are not workers. These servers also have their different fixed services. We just have to pass different parameters to schedule these shell scripts to execute.
> 
> Python has a module to execute ssh script Python 有固定的工具包,可执行这些SSH Shell 脚本
> 
> Python 有一个可执行远程服务器SSH Shell脚本的模块,其名字为:paramiko。
> 
> Python has a module that can execute SSH Shell script. It's paramiko.
> 
> Others 其他内容
> 
> 我发现之前的改进功能中也有关于这个的描述,不过相对简单。功能更新地址
> 
> I found this described in previous feature, but it was relatively simple.
> Feature URL
> 
> 另外,我通过 Shell Task 方式去执行远程任务会非常不便,下面是我的脚本,不知道是否有更好的方式。
> 
> In addition, it is very inconvenient for me to perform remote tasks through Shell Task. Here is my script. I don't know if there's a better way.
> sshpass -p 'password' ssh user@host echo 'ssh success' echo 'Hello World' -&gt; /home/dolphinscheduler/test/hello.txt echo 'end'
> 
> 
> 
> Support dummy task like airflow 支持像 Airflow 中的虚拟任务
> 
> 场景描述:项目中,有已经产品化的 DAG 文件,DAG 文件中包括不同的模块,这些模块之间的有些点是相互依赖的,有些不是,在用户购买不同模块时,需要把未购买模块且其他已购模块未依赖的点设置为 Dummy Task,这样实际这些任务就不会执行,这样设置的好处是产品统一性和图的完整性,在AirFlow中,这些是通过DummyOperator完成的。
> 
> For example, in my project, it has a productized DAG file. The file contains different modules, some of which are interdependent and some of which are not. When customers purchase different modules, we need to set some tasks as dummy tasks, which some modules are not purchased and the purchased module is not dependent. Because of this setting, these dummy tasks are actually not executed. The benefits of this setup are product unity and diagram integrity. In airflow, these task execute by dummy operator.
> 
> ** Realize 实现方式**
> 
> Dummy Task 本身实现很简单,只是需要与其他任务配合使用,但任务执行方式设置为 dummy 时,实际的任务不执行,执行 Dummy Task。
> 
> Dummy Task is easy to realize, but it need to use with other different tasks. When the task's executed type is set to dummy type, the task are executed as a dummy task and the real task is not executed.
> 
> 
> 
> 
> 顺带说一下,因为项目着急测试使用,我Fork了开发版本,实现两种任务类型。在后续的版本中是否能够支持。
> 
> By the way,I already realize these two&nbsp; features in my fork branch.&nbsp;Whether the follow-up release can be supported