You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@dolphinscheduler.apache.org by lidong dai <da...@gmail.com> on 2020/10/17 14:21:14 UTC

Re: Re: About the high availability implementation of the Alert service

hi
   simply implement +1

By the way, I think the HA of Alert service is not very important compared
with Master/Worker Server,   master-standby alert services could be
implemented in the future when needed


Best Regards
---------------
DolphinScheduler(Incubator) PPMC
Lidong Dai 代立冬
dailidong66@gmail.com
---------------


On Tue, Sep 29, 2020 at 12:01 PM felix <fe...@thinkingdata.cn> wrote:

> I want to simply implement the logic of adding an exclusive lock when the
> Alert service queries the database, so that even if two alert services are
> started, there is no problem with multiple alerts being sent.
>
>
> 我想简单实现掉,在alert服务查询数据库时加一个排它锁的逻辑,这样即使启动两个alert服务也不会造成告警重复发送的问题。
>
>
>  原始邮件
> 发件人: felix@thinkingdata.cn<fe...@thinkingdata.cn>
> 收件人: dev<de...@dolphinscheduler.apache.org>
> 发送时间: 2020年8月24日(周一) 12:50
> 主题: Re: Re: About the high availability implementation of the Alert service
>
>
> Can a single instance achieve high availability?
> Of course, there are many ways to implement high availability
>
>
> 单实例可以实现高可用吗?
> 当然高可用的实现方式有多种
>
>
> felix@thinkingdata.cn
>
> From: wu shaoj
> Date: 2020-08-24 12:36
> To: dev@dolphinscheduler.apache.org
> Subject: Re: About the high availability implementation of the Alert
> service
> I think there's no relationship between stability and the multi-instance
> at all.
>
>
> On 2020/8/24, 11:17, "felix@thinkingdata.cn" <fe...@thinkingdata.cn>
> wrote:
>
>     Just to be clear, the one I mentioned is the stability of the Alert
> Server, which is a different requirement from the customization of the
> alert service.When the Alert Server service is up and running, it makes
> sense to invoke the user's own alerts implemented through the plug-in.I
> only agree that this should be postponed, or I can make it happen
> sometime.But I don't agree to lower the stability criteria for DS.
>
>     还要说明一点,我提到的这个是alert server的稳定性,和告警服务的定制化是不同的需求。在alert server
> 服务正常运行的情况下,调用用户自己通过插件实现的告警才是有意义的。我只同意,这个在排期上延后,或者我抽空实现掉。但是我不同意,降低DS的稳定性标准。
>
>
>
>     felix@thinkingdata.cn
>
>     From: felix@thinkingdata.cn
>     Date: 2020-08-24 11:10
>     To: dev
>     Subject: Re: Re: About the high availability implementation of the
> Alert service
>
>     At the very least, support that the Alert service is multi-instance.In
> this way, the first exception can be notified.
>     Customized alerts can be plugins and implemented by the user, but the
> alert service is the basis for DS outgoing alerts, and the stability of
> this service is necessary.No one will accept that the problem with the
> dispatch platform is that there is no alarm.
>     Also, it doesn't make sense to have a high level of service
> availability for users to implement on their own. It's an architectural
> design issue.It's not about customizing requirements.Service stability is a
> common requirement, not a custom requirement.
>
>
>     那至少要支持alert服务是多实例的。这样出现异常才可以第一时间告知。定制化的告警,可以插件化交给用户自己实现,但是alert
> 服务是DS向外告警的基础,这个服务的稳定时必要的。谁也不会接受,调度平台出问题是,无法告警。
>     而且,服务级别的高可用交给用户自己实现,是不合理的这个是架构上设计的问题。不是定制化需求的问题。服务的稳定是一个公共需求,而不是定制化的需求。
>
>
>
>
>     felix@thinkingdata.cn
>     From: wu shaoj
>     Date: 2020-08-24 10:50
>     To: dev@dolphinscheduler.apache.org
>     Subject: Re: About the high availability implementation of the Alert
> service
>     I don't think the ha of alert is necessary at present or in the
> future. This extension can be extended by users
>     On 2020/8/23, 10:44, "Yichao Yang" <10...@qq.com> wrote:
>         Hi,
>         I don't think the ha of alert is necessary at present. This
> extension can be extended &nbsp;by users. We should focus on the current
> scheduling.
>         Best,
>         Yichao Yang
>         ------------------ Original ------------------
>         From: JUN GAO <gaojun2048@gmail.com&gt;
>         Date: Sat,Aug 22,2020 9:41 PM
>         To: dev <dev@dolphinscheduler.apache.org&gt;
>         Subject: Re: About the high availability implementation of the
> Alert service
>         I think the first one is better.
>         felix@thinkingdata.cn <felix@thinkingdata.cn&gt;于2020年8月22日
> 周六19:30写道:
>         &gt; hi&nbsp; ALL
>         &gt;
>         &gt; I would like to make a suggestion that the Alert Module is
> not currently
>         &gt; designed to be in a high availability state, and that there
> are problems
>         &gt; with sending repeated alerts when multiple alert services are
> started.
>         &gt; Alarm service down, DS alarm failure problem.
>         &gt; So far, I've come up with two architectures that address the
> problem of
>         &gt; sending warning messages repeatedly, while implementing the
>         &gt; high-availability Alert Moduler feature.
>         &gt;
>         &gt; 1、The first is the master-slave relationship between the
> alert services
>         &gt; through ZK. Only the master node is responsible for sending
> information.
>         &gt; After the master node is suspended, the master is selected
> again, and the
>         &gt; new master node continues to provide the warning service.
>         &gt; 2.The second is a de-centralised design in which all alert
> services work
>         &gt; simultaneously through exclusive locks between them, in which
> case the
>         &gt; alert messages are not repeated.
>         &gt;
>         &gt; If we have a better plan, we can discuss it together
>         &gt;
>         &gt; Thx
>         &gt;
>         &gt; 中文:
>         &gt; 我提一个建议,目前alert module 设计上还不是高可用状态,存在启动多个alert
> 服务时,会重复发送告警信息的问题。
>         &gt; 告警服务挂掉,ds告警功能失效的问题。
>         &gt; 目前我想到了两种架构来解决重复发送告警信息的问题,同时实现alert moduler高可用功能。
>         &gt; 1.第一种是alert 服务之间通过zk
> 实现主从关系,只有主节点来负责信息发送,在主节点挂掉后,重新选主,新的主节点来继续提供告警服务。
>         &gt; 2.第二种采用去中心的设计,alert 服务 之间通过排它锁来实现所有alert
> 服务同时工作,并在这种情况下保证告警信息不重复发送。
>         &gt; 如果大家有更好的方案,可以一起讨论
>         &gt;
>         &gt; 谢谢
>         &gt;
>         &gt;
>         &gt;
>         &gt;
>         &gt; felix@thinkingdata.cn
>         &gt;
>         --
>         DolphinScheduler(Incubator)&nbsp; PPMC
>         Jun Gao 高俊
>         gaojun2048@gmail.com