You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@dolphinscheduler.apache.org by GitBox <gi...@apache.org> on 2022/11/21 13:38:53 UTC

[GitHub] [dolphinscheduler] weixiaonan1 opened a new issue, #12953: [Feature][alert-plugin & task-SQL] Send results to message queue (result of workflow executiton and SQL query task)

weixiaonan1 opened a new issue, #12953:
URL: https://github.com/apache/dolphinscheduler/issues/12953

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar feature requirement.
   
   
   ### Description
   
   This issue consists of two parts: 
   1. add alert-plugin-MessageQueue(MQ) to send result of workflow execution to MQ which user specified 
   2. In query SQL task, add a choice to send query results to MQ(use alarm group mentioned in 1) in addition to send email
   
   These two features will provide a lot of convenience for dolphinscheduler(ds)'s integration with third-part systems. 
   1. Many third-part systems use ds as an infrastructure, using open-api to manage projects and workflows, like creating workflows, starting workflow manully and setting timing rules for automatic scheduling. In many situations, these systems need to get the execution result, and continue to execute some codes. However, when we(or scheduled task) start a workflow and we don't know the execution results unless Polling for the status. Provided alert plugins like dingTalk, email,  etc  are difficult to be integreted into third-part systems to get results. It's a good idea to **send execution result to message queue**, and then these systems can consume the msgs.
   2. In some situations, for example, we have many sqls which need to be executed in different db instanses or even in different type of databases. We can send these sqls to ds, because ds support many different types of datasource like mysql, pgsql (easy to extend like DM and other db), also ds can execute these sqls  parallelly in many workers. But currently ,we cannot get query results which never be stored and only support to be sent with emails. When we implement the alert-plugin-MQ, we can add a  choice to select a alert group(use alert-plugion-MQ) to send results to MQ for third-part systems to consume.
   
   这个issue实际上是两个问题:
   1. 告警插件新增一个消息队列插件,将工作流的执行结果写入用户指定的消息队列中
   2. 在SQL任务中,查询类型的SQL除了支持将结果发送邮件外,还支持将结果写入消息队列中(可以复用1中所述的告警组)
   
   这两个特性将会给dolphinscheduler(ds)对第三方系统的集成带来极大的便利。
   
   1. 很多系统将ds作为底层的基础设施,利用ds提供的open-api管理项目和工作流,如创建工作流、手动启动工作流、给工作流设定定时策略交给ds定时执行等。但是在手动执行工作流或者定时任务启动工作流时,我们无法及时的拿到执行结果(定时任务是自动启动的,手动执行接口是异步的)。第三方系统往往需要拿到执行结果并根据成功或者失败执行对应的逻辑,而现在提供的告警插件(邮件、钉钉等)很难集成在第三方系统中来获取执行结果,使用轮询的方式查看工作流状态开销也很大,比较好的方式是利用消息队列,将执行结果写入消息队列供第三方系统消费。
   2. 在有些场景下,如有很多在不同的数据库实例,甚至是不同类型的数据库上的SQL需要执行,我们会将SQL交给ds来执行,一方面利用ds支持多种数据源的能力(MySQL, Oracle, 而且很方便拓展,如达梦数据库等),另一方面也利用ds分布式执行的能力并行执行增加效率。SQL执行结束后将结果发送给第三方系统,第三方系统继续执行下面的逻辑。但是现有的SQL任务执行结果不会保存,而且只支持将结果发送给邮件,所以第三方系统拿不到执行结果。如果增加了消息队列告警插件,我们就可以像发送邮件一样,在Query SQL任务中指定消息队列告警组,将SQL查询结果写入消息队列,第三方系统就可以及时消费了。
   
   
   
   
   
   ### Use case
   
   1. add alert-plugin-MessageQueue(MQ): Apache RocketMQ as an example
       - user can input ProducerGroupName, NameSvrAddress, Topic, Tags to specify a MQ as alert instance.
       - create alert group with this instance
       - start a workflow and choose this alert group, set Notification Strategy to all
       - third-part systems consume msgs in MQ(the screenshot is from RocketMQ-Dashboard)
   <div align="center"><img src="https://user-images.githubusercontent.com/33857431/203058931-ff364c68-c08f-449c-a433-0523e294e127.png" width="500px"></div>
   <div align="center"><img src="https://user-images.githubusercontent.com/33857431/203062914-73524e7f-e001-475a-8883-f94f4204fa65.png" width="500px"></div>
   <div align="center"><img src="https://user-images.githubusercontent.com/33857431/203065905-c76cbbdc-b7b3-4e67-97a2-1634ec7f9bf1.png" width="600px"></div>
   
   2. add a choice to support send SQL query results to MQ
       - user can choose to send to MQ in Query SQL task
       - choose a MQ alert group
       - when the task finished, the query results will be sent to MQ    
       - third-part systems consume msgs in MQ
   
   <div align="center"><img src="https://user-images.githubusercontent.com/33857431/203066830-71d610dc-f918-4110-a9d2-64f678b1e37e.png" width="500px"></div>
   
   
   ### Related issues
   
   [Feature]Send to message queue request after reading data(读取数据后发送到消息队列需求) #657
   
   ### Are you willing to submit a PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] Radeity commented on issue #12953: [Feature][alert-plugin & task-SQL] Send results to message queue (result of workflow executiton and SQL query task)

Posted by GitBox <gi...@apache.org>.
Radeity commented on issue #12953:
URL: https://github.com/apache/dolphinscheduler/issues/12953#issuecomment-1322932154

   Hi, @weixiaonan1 
   It's good to import MQ in your scenario which can decouple DS and your third-part systems. **However,** whether feedback it to DS have to discuss further. In my view, integrate such middleware support into DS is kind of weird. Instead, for change data capturing, you can independently use other middleware to monitor database change log (such as Canal, Debezium, Maxwell) which bring more flexibility, or use http-alert-plugin and develop an http interface to handle the result of execution. 
   
   WDYT? Welcome to discuss about it.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] Radeity commented on issue #12953: [Feature][alert-plugin & task-SQL] Send results to message queue (result of workflow executiton and SQL query task)

Posted by GitBox <gi...@apache.org>.
Radeity commented on issue #12953:
URL: https://github.com/apache/dolphinscheduler/issues/12953#issuecomment-1325883125

   Hi, @weixiaonan1. You can send an email to [dev@dolphinscheduler.apache.org](https://lists.apache.org/list.html?dev@dolphinscheduler.apache.org) in which can get more feedback!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Feature][alert-plugin & task-SQL] Send results to message queue (result of workflow executiton and SQL query task) [dolphinscheduler]

Posted by "weixiaonan1 (via GitHub)" <gi...@apache.org>.
weixiaonan1 closed issue #12953: [Feature][alert-plugin & task-SQL] Send results to message queue (result of workflow executiton and SQL query task)
URL: https://github.com/apache/dolphinscheduler/issues/12953


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] github-actions[bot] commented on issue #12953: [Feature][alert-plugin & task-SQL] Send results to message queue (result of workflow executiton and SQL query task)

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #12953:
URL: https://github.com/apache/dolphinscheduler/issues/12953#issuecomment-1322080540

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar feature requirement.
   
   
   ### Description
   
   This issue consists of two parts: 
   1. add alert-plugin-MessageQueue(MQ) to send result of workflow execution to MQ which user specified 
   2. In query SQL task, add a choice to send query results to MQ(use alarm group mentioned in 1) in addition to send email
   
   These two features will provide a lot of convenience for dolphinscheduler(ds)'s integration with third-part systems. 
   1. Many third-part systems use ds as an infrastructure, using open-api to manage projects and workflows, like creating workflows, starting workflow manully and setting timing rules for automatic scheduling. In many situations, these systems need to get the execution result, and continue to execute some codes. However, when we(or scheduled task) start a workflow and we don't know the execution results unless Polling for the status. Provided alert plugins like dingTalk, email,  etc  are difficult to be integreted into third-part systems to get results. It's a good idea to **send execution result to message queue**, and then these systems can consume the msgs.
   2. In some situations, for example, we have many sqls which need to be executed in different db instanses or even in different type of databases. We can send these sqls to ds, because ds support many different types of datasource like mysql, pgsql (easy to extend like DM and other db), also ds can execute these sqls  parallelly in many workers. But currently ,we cannot get query results which never be stored and only support to be sent with emails. When we implement the alert-plugin-MQ, we can add a  choice to select a alert group(use alert-plugion-MQ) to send results to MQ for third-part systems to consume.
   
   这个issue实际上是两个问题:
   1. 告警插件新增一个消息队列插件,将工作流的执行结果写入用户指定的消息队列中
   2. 在SQL任务中,查询类型的SQL除了支持将结果发送邮件外,还支持将结果写入消息队列中(可以复用1中所述的告警组)
   
   这两个特性将会给dolphinscheduler(ds)对第三方系统的集成带来极大的便利。
   
   1. 很多系统将ds作为底层的基础设施,利用ds提供的open-api管理项目和工作流,如创建工作流、手动启动工作流、给工作流设定定时策略交给ds定时执行等。但是在手动执行工作流或者定时任务启动工作流时,我们无法及时的拿到执行结果(定时任务是自动启动的,手动执行接口是异步的)。第三方系统往往需要拿到执行结果并根据成功或者失败执行对应的逻辑,而现在提供的告警插件(邮件、钉钉等)很难集成在第三方系统中来获取执行结果,使用轮询的方式查看工作流状态开销也很大,比较好的方式是利用消息队列,将执行结果写入消息队列供第三方系统消费。
   2. 在有些场景下,如有很多在不同的数据库实例,甚至是不同类型的数据库上的SQL需要执行,我们会将SQL交给ds来执行,一方面利用ds支持多种数据源的能力(MySQL, Oracle, 而且很方便拓展,如达梦数据库等),另一方面也利用ds分布式执行的能力并行执行增加效率。SQL执行结束后将结果发送给第三方系统,第三方系统继续执行下面的逻辑。但是现有的SQL任务执行结果不会保存,而且只支持将结果发送给邮件,所以第三方系统拿不到执行结果。如果增加了消息队列告警插件,我们就可以像发送邮件一样,在Query SQL任务中指定消息队列告警组,将SQL查询结果写入消息队列,第三方系统就可以及时消费了。
   
   
   
   
   
   ### Use case
   
   1. add alert-plugin-MessageQueue(MQ): Apache RocketMQ as an example
       - user can input ProducerGroupName, NameSvrAddress, Topic, Tags to specify a MQ as alert instance.
       - create alert group with this instance
       - start a workflow and choose this alert group, set Notification Strategy to all
       - third-part systems consume msgs in MQ(the screenshot is from RocketMQ-Dashboard)
   <div align="center"><img src="https://user-images.githubusercontent.com/33857431/203058931-ff364c68-c08f-449c-a433-0523e294e127.png" width="500px"></div>
   <div align="center"><img src="https://user-images.githubusercontent.com/33857431/203062914-73524e7f-e001-475a-8883-f94f4204fa65.png" width="500px"></div>
   <div align="center"><img src="https://user-images.githubusercontent.com/33857431/203065905-c76cbbdc-b7b3-4e67-97a2-1634ec7f9bf1.png" width="600px"></div>
   
   2. add a choice to support send SQL query results to MQ
       - user can choose to send to MQ in Query SQL task
       - choose a MQ alert group
       - when the task finished, the query results will be sent to MQ    
       - third-part systems consume msgs in MQ
   
   <div align="center"><img src="https://user-images.githubusercontent.com/33857431/203066830-71d610dc-f918-4110-a9d2-64f678b1e37e.png" width="500px"></div>
   
   
   ### Related issues
   
   [Feature]Send to message queue request after reading data(读取数据后发送到消息队列需求) #657
   
   ### Are you willing to submit a PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] weixiaonan1 commented on issue #12953: [Feature][alert-plugin & task-SQL] Send results to message queue (result of workflow executiton and SQL query task)

Posted by GitBox <gi...@apache.org>.
weixiaonan1 commented on issue #12953:
URL: https://github.com/apache/dolphinscheduler/issues/12953#issuecomment-1323164647

   @Radeity Thanks for your feedback.
   
   Currently, our alert group just aims to notify the user. It means the total end of a workflow execution. The same goes for sending SQL query results through emails.  So if we integrate MQ support into DS, sending these msgs to MQ in order to consume in other systems, it's really quite different from current mode and looks weird. However, from the persperctive of DS itself, we just make limited extension and no changes to existing functions . Add alert-plugin-MQ and send workflow execution result to MQ is same to send it througn DingTalk or emails. DS finish its task when the msgs are sent. The only difference is that user can deal with these msg more flexible. DS doesn't need to know the subsequent process in third-part systems, it's completely decoupled. It's same when we send SQL query results to MQ, DS just needs to send msgs to MQ like sending througn mails now. Users can write msgs in their own db or send a more complex email after processing.
   
   The two ways you mentioned in reply are not so good i think. It brings complexities when we use CDC middleware in third-part systems and this way is too heavy. Use http-alert-plugin increases the coupling between DS and third-part systems. In this scenario, MQ is a great solution.
   
   Of course,  we need to see whether many people have similar requirements to decide whether providing these MQ-related features in DS.  And I'm very glad to participate in the development.
   
   现在DS提供的告警组只是向用户**通知**结果,通知结束也就意味着整个工作流执行**闭环**了。SQL任务组件中的查询结果发送邮件同理。引入了消息队列后,打破了这一初衷,执行结果或者查询结果写入消息队列,会有其他系统进行消费,看起来确实有些古怪。但是站在DS的角度来讲,其实完全没有影响现有的逻辑,只是进行了有限的拓展。增加MQ告警组,将执行结果通过MQ发送给用户,和通过钉钉、邮件发送给用户是一样的,通知结束那么DS的任务就完成了,只是用户可以通过消费MQ中的消息来更灵活处理执行结果,至于如何处理完全与DS解耦,DS并不需要关心。SQL任务组件中的查询结果发送给MQ也同理,用户可以利用MQ将查询结果写入自己的数据库、或者加工处理后发送一封更复杂的邮件等等,DS只需要像现在发送邮件一样将查询结果写入MQ就可以
 了。
   
   上面提供的两个方案,利用CDC(Canal等)中间件,极大的增加了复杂度;使用http-alert-plugin一定程度上使DS与集成系统耦合紧密。在我所述的两个场景下,使用MQ的方案会更好。
   
   当然DS是否提供这两个特性,需要看看这些需求是特例还是共性,如果很多人有这方面的需要,我很乐意参与开发当中。希望大家多多讨论。


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] github-actions[bot] commented on issue #12953: [Feature][alert-plugin & task-SQL] Send results to message queue (result of workflow executiton and SQL query task)

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #12953:
URL: https://github.com/apache/dolphinscheduler/issues/12953#issuecomment-1322080880

   Thank you for your feedback, we have received your issue, Please wait patiently for a reply.
   * In order for us to understand your request as soon as possible, please provide detailed information、version or pictures.
   * If you haven't received a reply for a long time, you can [join our slack](https://s.apache.org/dolphinscheduler-slack) and send your question to channel `#troubleshooting`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org