You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@dolphinscheduler.apache.org by "weixiaonan1 (via GitHub)" <gi...@apache.org> on 2023/08/30 03:16:55 UTC

[GitHub] [dolphinscheduler] weixiaonan1 opened a new issue, #14832: [Feature][Listener]Implementation of Listener Mechanism

weixiaonan1 opened a new issue, #14832:
URL: https://github.com/apache/dolphinscheduler/issues/14832

### Search before asking

- [X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar feature requirement.

### Description

This is a [DolphinScheduler project of OSPP-2023](https://summer-ospp.ac.cn/org/prodetail/232290593)
This implementation allows to add listeners for monitoring and tracking the workflow/task state using plugins without restarting the service.
1. Users can customize **listener plugins** by building a plugin JAR which implemented the methods in ListenerPlugin and registering it in `Security module`.
2. Users can create **listener instances** with the listener plugin and choose the **listener events** to subscribe to in `Security module`.
3. The **listener events** monitored by the **listener instance** will be pushed to the specified destination (message queues, external system interfaces, etc.).

### Use case

_No response_

### Related issues

_No response_

### Are you willing to submit a PR?

- [X] Yes I am willing to submit a PR!

### Code of Conduct

- [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [dolphinscheduler] weixiaonan1 commented on issue #14832: [Feature][Listener]Implementation of Listener Mechanism

Posted by "weixiaonan1 (via GitHub)" <gi...@apache.org>.

weixiaonan1 commented on issue #14832:
URL: https://github.com/apache/dolphinscheduler/issues/14832#issuecomment-1704043702

   > >@ruanwenjun @zixi0825
   > >1. Do We need **Global Alarm Types**: That means that the alert instance accepts all events by default without specifying a specific workflow.
   > > 
   > > 1. overhead: Adding global alarm type will incur extra overhead.  We need to query database to see whether there is a global alert instance before creating events. For example, when workflow starts,  we query database first and if there are any global alert instances, create a WorkflowStart event and save it in database .
   >
   >I am not clear why we need to `query if there exist xx event before workflow start`, if we want to send the workflow start event, we just write a event record when workflow start, do you means when we start a workflow twice we only send one event? this is unreasonable. In additional of that, this kind of workflow is doing by alert server ,this will not overhead.
   
   We query database to check if there exist  **global alert instances** rather than `query if there exist xx events` before creating events. Events will be generated when there exactly exists global alert instances. 
   Taking the example of a workflowEnd Event: In the current design, when a workflow is initiated, we bind an alert group to the workflow instance. Therefore, when the workflow execution ends, we can find the alert strategy and alert group directly from `WorkflowInstance`, and if the alert strategy meets the requirement and there is an alert group bound to the instance, we create the workflowEndEvent. With the addition of global alarm types, global alert instances are not bound to workflow instances. So, when workflow execution ends (or begins or other scenarios that may generate events), it's necessary to query the database to check whether there exists global alert instances. Of course, this process can be optimized. For instance, we can query for the alert intance only when constructing the `WorkflowExecuteRunnable`, rather than querying the database each time an event might be generated.
   
   > 
   > > 2. Do we need **more Alarm event types**? Such as workflowAddEvent/workflowUpdateEvent/workflowDeleteEvent/workflowStartEvent,etc.
   > > 3. **Flexibility:** In the current alert module, the title and content are determined when master server create an alert. So the format of meassges is the same for different alert plugins. Do we need to increase some flexibility? similar to KafkaListener, plugins can generate messages in different formats and perform different processing logic for different event types.
   > 
   > In fact the AlertRecord only need to contains some metadata information, the content/title or something else should generate by alert sender(kafka/email....)
   
   You are right. Current design of sender if flexible enough.
   > 
   > > 4. **Failed Alert Messages:** Do we consider resending alert messages after they failed to send?
   > >    
   > >    1. If we resend the failed messages: To ensure that messages sent by the same alert instance are in order (e.g., workflowStartEvent should precede workflowEndEvent) and do not affect other instances, the current message processing method needs to be changed. Each alert instance should process its events in chronological order.
   > 
   > Right now, the alert server will use one loop thread to loop the event, this is guaranteed.
   > 
   
   The Alert Server only processes PendingAlerts and does not handle FailedAlerts, which ensures the order of events. But, if we want to **retry FailedAlerts**, things become more complex.
   - Out-of-Order Messages: For alert instance `instanceA`, when the `process1` start event fails to send, and the `process1` end event succeeds, if we retry sending the `process1` start event and it succeeds, it can result in out-of-order events.
   - Availability: If the messages for alert instance `instanceA` continuously fail to send and a large number of events accumulate, it can impact other alert instances to send messages.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [dolphinscheduler] weixiaonan1 commented on issue #14832: [Feature][Listener]Implementation of Listener Mechanism

Posted by "weixiaonan1 (via GitHub)" <gi...@apache.org>.

weixiaonan1 commented on issue #14832:
URL: https://github.com/apache/dolphinscheduler/issues/14832#issuecomment-1723996872

   I think we can split this task into several subtasks.
   
   1. in this OSPP project, we will not change our alert mechanism,  just add an option for creating a global alert instance, distinct from the existing alert instances. The global alert instance will process various global events.
   2. in future PRs, we can make some modifications to the alert mechanism, such as refining the `ServerAlertContent`, `ProcessAlertContent`, `TaskAlertContent`, and `DqExecuteResultAlertContent` into more specific event types, refactor AlertData so that AlertPlugin process events directly and so on. Since we have introduced many types of events (processCreateEvent, etc.) in addition to existing alert events,  the term "alert module" may no longer be suitable; it might need to be refactored as the "notification module.” and more changes need to be done.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [dolphinscheduler] ruanwenjun commented on issue #14832: [Feature][Listener]Implementation of Listener Mechanism

Posted by "ruanwenjun (via GitHub)" <gi...@apache.org>.

ruanwenjun commented on issue #14832:
URL: https://github.com/apache/dolphinscheduler/issues/14832#issuecomment-1703657948

   > @ruanwenjun @zixi0825
   > 
   > I think we should discuss these features first before deciding whether create a new listener module or develop based on alert module.
   > 
   > First and foremost, we agree on this consensus: we cannot guarantee that the event/alert not be lost because we don’t want to make ds vulnerable.
   > 
   > 1. Do We need **Global Alarm Types**: That means that the alert instance accepts all events by default without specifying a specific workflow.
   >    
   >    1. overhead: Adding global alarm type will incur extra overhead.  We need to query database to see whether there is a global alert instance before creating events. For example, when workflow starts,  we query database first and if there are any global alert instances, create a WorkflowStart event and save it in database .
   
   I am not clear why we need to `query if there exist xx event before workflow start`, if we want to send the workflow start event, we just write a event record when workflow start, do you means when we start a workflow twice we only send one event? this is unreasonable. In additional of that, this kind of workflow is doing by alert server ,this will not overhead.
   
   >    2. abstract interface: If global alarm type is added, there will be a lot of events need to be stored and retrived later. Do we need to design an interface for saving and retriving events?  we can choose to store events in other places(like redis or other database)  besides current database.  The save method will be surrounded with try-catch block, which does not affect the normal execution of the workflow.
   
   This doesn't conflict with the current design, you can add an event store in the alert module. We don't want to make the master connect to many third-part systems. DS is a basic system, the redis/kafka or some other system might rely on DS, so DS core server shouldn't rely on this.
   
   > 2. Do we need **more Alarm event types**? Such as workflowAddEvent/workflowUpdateEvent/workflowDeleteEvent/workflowStartEvent,etc.
   > 3. **Flexibility:** In the current alert module, the title and content are determined when master server create an alert. So the format of meassges is the same for different alert plugins. Do we need to increase some flexibility? similar to KafkaListener, plugins can generate messages in different formats and perform different processing logic for different event types.
   
   In fact the AlertRecord only need to contains some metadata information, the content/title or something else should generate by alert sender(kafka/email....)
   
   > 4. **Failed Alert Messages:** Do we consider resending alert messages after they failed to send?
   >    
   >    1. If we resend the failed messages: To ensure that messages sent by the same alert instance are in order (e.g., workflowStartEvent should precede workflowEndEvent) and do not affect other instances, the current message processing method needs to be changed. Each alert instance should process its events in chronological order.
   
   Right now, the alert server will use one loop thread to loop the event, this is guaranteed.
   
   > 5. **Dynamic Plugin:** In the design of this listener module, we can upload custom plugin jar, this jar will be dynamically loaded through the URLClassLoader. We can install/update/uninstall the plugin jar in a seperate page in `Security Module`. If develop based on the alert module, should we keep this design? [[Feature][task] Supports custom tasks #14817](https://github.com/apache/dolphinscheduler/issues/14817)  said it need a page to support uploading jars of custom task plugins, maybe it is a better way to keep this design and expand to manage various types of dynamic plugin(alert plugin, task plugin, etc.).
   
   This is another issue, in fact, in the early implementation before 3.0, we use customer classloader to load the plugin jar, after some discussion we thought this is overdesign, so we changed to JVM default SPI loader to load the plugin, this is not related to this feature.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [dolphinscheduler] zixi0825 commented on issue #14832: [Feature][Listener]Implementation of Listener Mechanism

Posted by "zixi0825 (via GitHub)" <gi...@apache.org>.

zixi0825 commented on issue #14832:
URL: https://github.com/apache/dolphinscheduler/issues/14832#issuecomment-1699031843

   > > Why we don't directly use alert module?
   > 
   > Althougn the Alert module and Listener module have similarities in specific scenarios, such as both supporting to notify users about the execution results of workflows/tasks througn various plugins, there are several key differences:
   > 
   > 1. **Purpose:**
   >    
   >    * **Alert Module:** The Alert module is designed specifically to inform users about the execution results of individual workflows or tasks (which specfied by the user when running the workflow).  The content and format of Alert are fixed. Alerts are sent directly to users via emails, Telegram, Ding Talk, etc.
   >    * **Listener Module:** The Listener module is designed for third-party systems to monitor and manage DolphinScheduler.  The listener instances are global which are not limited to specific workflows/tasks, so in addition to the  workflow/task related events  like workflowStartEvent, workflowFailedEvent, etc., it also provides system-wide events like serverDownEvent, workflowAddedEvent, etc.  Users can customize the format and content of messages based on listener events. The messages are sent to Kafka, RabbitMQ, external system interfaces, etc
   > 2. **Flexibility and Extensibility:**
   >    
   >    * **Alert Module:** Alert Plugins are build-in. When creating a new alert plugin, modifications to source code, repackaging, and service restarts are required. The content and format of Alert are fixed.
   >    * **Listener Module:** The listener module offers more flexbility. We can create a separate project to develop new plugins depended on a single module(`dolphinsheduler-dolphinsheduler-common`) , without modifying the source code. Listener plugins can be installed, updated, or uninstalled online without  restarting the server. The listener module also provide more flexibility and convenience to customize the message formats and processing methods.
   
   To add, the listener also reuses the ability of alertserver to a certain extent, without creating a separate service. It sends event through the AlertServer. However, due to the differences in their usage, message type, and receiver, it is necessary to design a separate listener module


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [dolphinscheduler] weixiaonan1 commented on issue #14832: [Feature][Listener]Implementation of Listener Mechanism

Posted by "weixiaonan1 (via GitHub)" <gi...@apache.org>.

weixiaonan1 commented on issue #14832:
URL: https://github.com/apache/dolphinscheduler/issues/14832#issuecomment-1711039263

   > The event generator doesn't need to care about if there exists an event consumer.
   
   I think it is necessary to check if there exists an event consumer. The number of events can be large. If there doesn't exist an event consumer, we will do a lot of useless work:  generate events, save to DB, receive from DB, and delete them because no instance can process these events. So it would be better to have a switch to control whether events are generated or not, and query t_ds_alert_plugin_instance is a good option. This table's data number is small, and we can just query this table only once when constructing the `WorkflowExecuteRunnable`, to control whether WorkflowStart/End/Fail and TaskStart/End/Fail events are generated or not.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [dolphinscheduler] ruanwenjun commented on issue #14832: [Feature][Listener]Implementation of Listener Mechanism

Posted by "ruanwenjun (via GitHub)" <gi...@apache.org>.

ruanwenjun commented on issue #14832:
URL: https://github.com/apache/dolphinscheduler/issues/14832#issuecomment-1698736660

   Why we don't directly use alert module?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [dolphinscheduler] weixiaonan1 commented on issue #14832: [Feature][Listener]Implementation of Listener Mechanism

Posted by "weixiaonan1 (via GitHub)" <gi...@apache.org>.

weixiaonan1 commented on issue #14832:
URL: https://github.com/apache/dolphinscheduler/issues/14832#issuecomment-1698919134

> Why we don't directly use alert module?

Althougn the Alert module and Listener module have similarities in specific scenarios, such as both supporting to notify users about the execution results of workflows/tasks througn various plugins, there are several key differences:

1. **Purpose:**
- **Alert Module:** The Alert module is designed specifically to inform users about the execution results of individual workflows or tasks (which specfied by the user when running the workflow). The content and format of Alert are fixed. Alerts are sent directly to users via emails, Telegram, Ding Talk, etc.
- **Listener Module:** The Listener module is designed for third-party systems to monitor and manage DolphinScheduler. The listener instances are global which are not limited to specific workflows/tasks, so in addition to the workflow/task related events like workflowStartEvent, workflowFailedEvent, etc., it also provides system-wide events like serverDownEvent, workflowAddedEvent, etc. Users can customize the format and content of messages based on listener events. The messages are sent to Kafka, RabbitMQ, external system interfaces, etc
2. **Flexibility and Extensibility:**
- **Alert Module:** Alert Plugins are build-in. When creating a new alert plugin, modifications to source code, repackaging, and service restarts are required. The content and format of Alert are fixed.
- **Listener Module:** The listener module offers more flexbility. We can create a separate project to develop new plugins depended on a single module(`dolphinsheduler-dolphinsheduler-common`) , without modifying the source code. Listener plugins can be installed, updated, or uninstalled online without restarting the server. The listener module also provide more flexibility and convenience to customize the message formats and processing methods.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [I] [Feature][Listener]Implementation of Listener Mechanism [dolphinscheduler]

Posted by "SbloodyS (via GitHub)" <gi...@apache.org>.

SbloodyS closed issue #14832: [Feature][Listener]Implementation of Listener Mechanism
URL: https://github.com/apache/dolphinscheduler/issues/14832


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [dolphinscheduler] weixiaonan1 commented on issue #14832: [Feature][Listener]Implementation of Listener Mechanism

Posted by "weixiaonan1 (via GitHub)" <gi...@apache.org>.

weixiaonan1 commented on issue #14832:
URL: https://github.com/apache/dolphinscheduler/issues/14832#issuecomment-1699095895

   @zixi0825 Do you mean we need to create a seperate server (ListenerServer) instead of reuse AlertServer?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [dolphinscheduler] ruanwenjun commented on issue #14832: [Feature][Listener]Implementation of Listener Mechanism

Posted by "ruanwenjun (via GitHub)" <gi...@apache.org>.

ruanwenjun commented on issue #14832:
URL: https://github.com/apache/dolphinscheduler/issues/14832#issuecomment-1705845077

   > > > @ruanwenjun @zixi0825
   > > > 
   > > > 1. Do We need **Global Alarm Types**: That means that the alert instance accepts all events by default without specifying a specific workflow.
   > > > 2. overhead: Adding global alarm type will incur extra overhead.  We need to query database to see whether there is a global alert instance before creating events. For example, when workflow starts,  we query database first and if there are any global alert instances, create a WorkflowStart event and save it in database .
   > > 
   > > 
   > > I am not clear why we need to `query if there exist xx event before workflow start`, if we want to send the workflow start event, we just write a event record when workflow start, do you means when we start a workflow twice we only send one event? this is unreasonable. In additional of that, this kind of workflow is doing by alert server ,this will not overhead.
   > 
   > We query database to check if there exist **global alert instances** rather than `query if there exist xx events` before creating events. Events will be generated when there exactly exists global alert instances. Taking the example of a workflowEnd Event: In the current design, when a workflow is initiated, we bind an alert group to the workflow instance. Therefore, when the workflow execution ends, we can find the alert strategy and alert group directly from `WorkflowInstance`, and if the alert strategy meets the requirement and there is an alert group bound to the instance, we create the workflowEndEvent. With the addition of global alarm types, global alert instances are not bound to workflow instances. So, when workflow execution ends (or begins or other scenarios that may generate events), it's necessary to query the database to check whether there exists global alert instances. Of course, this process can be optimized. For instance, we can query for the alert intance only 
 when constructing the `WorkflowExecuteRunnable`, rather than querying the database each time an event might be generated.
   
   The event generator doesn't need to care about if there exists an event consumer.
   
   > 
   > > > 2. Do we need **more Alarm event types**? Such as workflowAddEvent/workflowUpdateEvent/workflowDeleteEvent/workflowStartEvent,etc.
   > > > 3. **Flexibility:** In the current alert module, the title and content are determined when master server create an alert. So the format of meassges is the same for different alert plugins. Do we need to increase some flexibility? similar to KafkaListener, plugins can generate messages in different formats and perform different processing logic for different event types.
   > > 
   > > 
   > > In fact the AlertRecord only need to contains some metadata information, the content/title or something else should generate by alert sender(kafka/email....)
   > 
   > You are right. Current design of sender if flexible enough.
   > 
   > > > 4. **Failed Alert Messages:** Do we consider resending alert messages after they failed to send?
   > > >    
   > > >    1. If we resend the failed messages: To ensure that messages sent by the same alert instance are in order (e.g., workflowStartEvent should precede workflowEndEvent) and do not affect other instances, the current message processing method needs to be changed. Each alert instance should process its events in chronological order.
   > > 
   > > 
   > > Right now, the alert server will use one loop thread to loop the event, this is guaranteed.
   > 
   > The Alert Server only processes PendingAlerts and does not handle FailedAlerts, which ensures the order of events. But, if we want to **retry FailedAlerts**, things become more complex.
   > 
   > * Out-of-Order Messages: For alert instance `instanceA`, when the `process1` start event fails to send, and the `process1` end event succeeds, if we retry sending the `process1` start event and it succeeds, it can result in out-of-order events.
   > * Availability: If the messages for alert instance `instanceA` continuously fail to send and a large number of events accumulate, it can impact other alert instances to send messages.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [dolphinscheduler] ruanwenjun commented on issue #14832: [Feature][Listener]Implementation of Listener Mechanism

Posted by "ruanwenjun (via GitHub)" <gi...@apache.org>.

ruanwenjun commented on issue #14832:
URL: https://github.com/apache/dolphinscheduler/issues/14832#issuecomment-1699232888

   > > Why we don't directly use alert module?
   > 
   > Althougn the Alert module and Listener module have similarities in specific scenarios, such as both supporting to notify users about the execution results of workflows/tasks througn various plugins, there are several key differences:
   > 
   > 1. **Purpose:**
   >    
   >    * **Alert Module:** The Alert module is designed specifically to inform users about the execution results of individual workflows or tasks (which specfied by the user when running the workflow).  The content and format of Alert are fixed. Alerts are sent directly to users via emails, Telegram, Ding Talk, etc.
   >    * **Listener Module:** The Listener module is designed for third-party systems to monitor and manage DolphinScheduler.  The listener instances are global which are not limited to specific workflows/tasks, so in addition to the  workflow/task related events  like workflowStartEvent, workflowFailedEvent, etc., it also provides system-wide events like serverDownEvent, workflowAddedEvent, etc.  Users can customize the format and content of messages based on listener events. The messages are sent to Kafka, RabbitMQ, external system interfaces, etc
   > 2. **Flexibility and Extensibility:**
   >    
   >    * **Alert Module:** Alert Plugins are build-in. When creating a new alert plugin, modifications to source code, repackaging, and service restarts are required. The content and format of Alert are fixed.
   >    * **Listener Module:** The listener module offers more flexbility. We can create a separate project to develop new plugins depended on a single module(`dolphinsheduler-dolphinsheduler-common`) , without modifying the source code. Listener plugins can be installed, updated, or uninstalled online without  restarting the server. The listener module also provide more flexibility and convenience to customize the message formats and processing methods.
   > 3. **Handling Failure**
   >    
   >    * **Alert Module:**  The Alert Module won't take any further action when message processing fails. Therefore, from the receiver's perspective, there's a risk of message loss.
   >    * **Listener Module:** Each listener instance has a corresponding ListenerInstancePostService that processes its listener events in chronological order. If message processing fails, the processing of subsequent messages is halted, and retries are attempted. This ensures that messages are neither lost nor unordered.
   
   1. The alert module can also send workflow/task change event and server crash event, and we already have a alert system, it' s not a good idea to add a new listener system, if you want to do some feature you need to design base on alert rather than import a new rule.
   3. I don't think `without modify the source code` is a reason to add this feature.
   4. Make event don't lost is not a good design, and if you wish to do this you will make ds vulnerable, ds is used to trigger workflow rather than a notify system, each notify task shouldn't affect ds available and workflow result, if a workflow execute success but notify subsystem failed, it should failed and block ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [dolphinscheduler] ruanwenjun commented on issue #14832: [Feature][Listener]Implementation of Listener Mechanism

Posted by "ruanwenjun (via GitHub)" <gi...@apache.org>.

ruanwenjun commented on issue #14832:
URL: https://github.com/apache/dolphinscheduler/issues/14832#issuecomment-1729854160

   > The design of this OSPP project:
   > 
   > 1. system-generated message can be considered as a kind of `ListenerEvent`,  we can add various kinds of events such as `ServerDownListenerEvent`, `TaskFailListenerEvent`, and `TaskCreateListenerEvent`.
   > 2. add an option for creating a global alert instance. When creating an alert instance, user can choose whether it is a global instance. If it is a global instance, it will by default receive all system-generated events. If it is not global, it will be a regular alert instance and user need to select the alarm type (success, failure, all) and bind them to an alert group.
   > 3. event generator: If there exists global alert instance, the global events will be generated (in `Api Server` or `Master Server`), and saved into database (in a seperate table, suppose `t_ds_listener_event` here). I think it's best to check whether there exists global instance before generating global events because we have no other ways to determine if users need to handle global events, if not, we will do a lot of useless work to process these events.
   > 4. event consumer: create a new DaemonThread similar to the `AlertBootstrapService` that reads pending events from table `t_ds_listener_event`,  retrieves global alert instances, build AlertData and the corresponding `AlertChannel`(AlertPlugin) will process the alertInfo. If succeeded, delete the event from table(because the number of global events will be large, delete directly may be a more suitable option). If failed, update the send status.
   
   Basically LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [dolphinscheduler] weixiaonan1 commented on issue #14832: [Feature][Listener]Implementation of Listener Mechanism

Posted by "weixiaonan1 (via GitHub)" <gi...@apache.org>.

weixiaonan1 commented on issue #14832:
URL: https://github.com/apache/dolphinscheduler/issues/14832#issuecomment-1702831830

@ruanwenjun @zixi0825

I think we should discuss these features first before deciding whether create a new listener module or develop based on alert module.

First and foremost, we agree on this consensus: we cannot guarantee that the event/alert not be lost because we don’t want to make ds vulnerable.

1. Do We need **Global Alarm Types**: That means that the alert instance accepts all events by default without specifying a specific workflow.
1. overhead: Adding global alarm type will incur extra overhead. We need to query database to see whether there is a global alert instance before creating events. For example, when workflow starts, we query database first and if there are any global alert instances, create a WorkflowStart event and save it in database .
2. abstract interface: If global alarm type is added, there will be a lot of events need to be stored and retrived later. Do we need to design an interface for saving and retriving events? we can choose to store events in other places(like redis or other database) besides current database. The save method will be surrounded with try-catch block, which does not affect the normal execution of the workflow.
2. Do we need **more Alarm event types**? Such as workflowAddEvent/workflowUpdateEvent/workflowDeleteEvent/workflowStartEvent,etc.
3. **Flexibility:** In the current alert module, the title and content are determined when master server create an alert. So the format of meassges is the same for different alert plugins. Do we need to increase some flexibility? similar to KafkaListener, plugins can generate messages in different formats and perform different processing logic for different event types.
4. **Failed Alert Messages:** Do we consider resending alert messages after they failed to send?
1. If we resend the failed messages: To ensure that messages sent by the same alert instance are in order (e.g., workflowStartEvent should precede workflowEndEvent) and do not affect other instances, the current message processing method needs to be changed. Each alert instance should process its events in chronological order.
5. **Dynamic Plugin:** In the design of this listener module, we can upload custom plugin jar, this jar will be dynamically loaded through the URLClassLoader. We can install/update/uninstall the plugin jar in a seperate page in `Security Module`. If develop based on the alert module, should we keep this design? #14817 said it need a page to support uploading jars of custom task plugins, maybe it is a better way to keep this design and expand to manage various types of dynamic plugin(alert plugin, task plugin, etc.).

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [dolphinscheduler] weixiaonan1 commented on issue #14832: [Feature][Listener]Implementation of Listener Mechanism

Posted by "weixiaonan1 (via GitHub)" <gi...@apache.org>.

weixiaonan1 commented on issue #14832:
URL: https://github.com/apache/dolphinscheduler/issues/14832#issuecomment-1724063265

The design of this OSPP project:
1. system-generated message can be considered as a kind of `ListenerEvent`, we can add various kinds of events such as `ServerDownListenerEvent`, `TaskFailListenerEvent`, and `TaskCreateListenerEvent`.
2. add an option for creating a global alert instance. When creating an alert instance, user can choose whether it is a global instance. If it is a global instance, it will by default receive all system-generated events. If it is not global, it will be a regular alert instance and user need to select the alarm type (success, failure, all) and bind them to an alert group.
3. event generator: If there exists global alert instance, the global events will be generated (in `Api Server` or `Master Server`), and saved into database (in a seperate table, suppose `t_ds_listener_event` here). I think it's best to check whether there exists global instance before generating global events because we have no other ways to determine if users need to handle global events, if not, we will do a lot of useless work to process these events.
4. event consumer: create a new DaemonThread similar to the `AlertBootstrapService` that reads pending events from table `t_ds_listener_event`, retrieves global alert instances, build AlertData and the corresponding `AlertChannel`(AlertPlugin) will process the alertInfo. If succeeded, delete the event from table(because the number of global events will be large, delete directly may be a more suitable option). If failed, update the send status.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [dolphinscheduler] ruanwenjun commented on issue #14832: [Feature][Listener]Implementation of Listener Mechanism

Posted by "ruanwenjun (via GitHub)" <gi...@apache.org>.

ruanwenjun commented on issue #14832:
URL: https://github.com/apache/dolphinscheduler/issues/14832#issuecomment-1729844511

   > > The event generator doesn't need to care about if there exists an event consumer.
   > 
   > I think it is necessary to check if there exists an event consumer. The number of events can be large. If there doesn't exist an event consumer, we will do a lot of useless work: generate events, save to DB, receive from DB, and delete them because no instance can process these events. So it would be better to have a switch to control whether events are generated or not, and query t_ds_alert_plugin_instance is a good option. This table's data number is small, and we can just query this table only once when constructing the `WorkflowExecuteRunnable`, to control whether WorkflowStart/End/Fail and TaskStart/End/Fail events are generated or not.
   
   
   Inject the event consumer info in WorkflowExecuteRunnable LGTM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org