You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@zookeeper.apache.org by "Gao,Wei" <We...@Arcserve.com> on 2019/09/26 01:00:30 UTC

RE: [jira] [Commented] (ZOOKEEPER-3556) Dynamic configuration file can not be updated automatically after some zookeeper servers of zk cluster are down

Hi Alexander Shraer,
 Could you please tell me how to implement automation on top? 
Thank you very much!

-----Original Message-----
From: Alexander Shraer (Jira) <ji...@apache.org> 
Sent: Thursday, September 26, 2019 1:27 AM
To: issues@zookeeper.apache.org
Subject: [jira] [Commented] (ZOOKEEPER-3556) Dynamic configuration file can not be updated automatically after some zookeeper servers of zk cluster are down


    [ https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ZOOKEEPER-2D3556-3Fpage-3Dcom.atlassian.jira.plugin.system.issuetabpanels-3Acomment-2Dtabpanel-26focusedCommentId-3D16937925-23comment-2D16937925&d=DwIFaQ&c=ZmK7amRlbztwfC_NTU_hNw&r=bTmnMF5RGYcfg4qOcKQAYjkGGUtOB2jR22ryrk8hNWk&m=UNFnO3kfjtUL8Jievmh9VMXf_nTLKBCfuJsaxe6FshU&s=XxgusqUbHgFrxTfTTcYuxMWxol3W-1dJ7WVzUqh1HAE&e=  ] 

Alexander Shraer commented on ZOOKEEPER-3556:
---------------------------------------------

The described behavior is not a bug – currently reconfiguration requires explicit action by an operator. One could implement automation on top. We should consider this as a feature, since it sounds like several adopters have implemented such automation. Perhaps one of them could contribute this upstream.

> Dynamic configuration file can not be updated automatically after some 
> zookeeper servers of zk cluster are down
> ----------------------------------------------------------------------
> -----------------------------------------
>
>                 Key: ZOOKEEPER-3556
>                 URL: https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ZOOKEEPER-2D3556&d=DwIFaQ&c=ZmK7amRlbztwfC_NTU_hNw&r=bTmnMF5RGYcfg4qOcKQAYjkGGUtOB2jR22ryrk8hNWk&m=UNFnO3kfjtUL8Jievmh9VMXf_nTLKBCfuJsaxe6FshU&s=NQvX26JbBDNMmEtQhirmYk7ELe46vCjn4kbm1VqcNsA&e= 
>             Project: ZooKeeper
>          Issue Type: Wish
>          Components: java client
>    Affects Versions: 3.5.5
>            Reporter: Steven Chan
>            Priority: Major
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> *I encountered a problem which blocks my development of load balance 
> using ZooKeeper 3.5.5.*
>    *Actually, I have a ZooKeeper cluster which comprises of five zk 
> servers. And the dynamic configuration file is as follows:*
>  **
> {color:#FF0000}  
> *server.1=zk1:2888:3888:participant;0.0.0.0:2181*{color}
> {color:#FF0000}  
> *server.2=zk2:2888:3888:participant;0.0.0.0:2181*{color}
> {color:#FF0000}  
> *server.3=zk3:2888:3888:participant;0.0.0.0:2181*{color}
> {color:#FF0000}  
> *server.4=zk4:2888:3888:participant;0.0.0.0:2181*{color}
> {color:#FF0000}  
> *server.5=zk5:2888:3888:participant;0.0.0.0:2181*{color}
>  **
>   *The zk cluster can work fine if every member works normally. 
> However, if say two of them are suddenly down without previously being 
> notified,* *the dynamic configuration file shown above will not be 
> synchronized dynamically, which leads to the zk cluster fail to work 
> normally.*
>   *As far as I am concerned, the dynamic configuration file should be 
> modified to this if server 1 and server 5 are down suddenly as 
> follows:* {color:#FF0000}  
> *server.2=zk2:2888:3888:participant;0.0.0.0:2181*{color}
> {color:#FF0000}  
> *server.3=zk3:2888:3888:participant;0.0.0.0:2181*{color}
> {color:#FF0000}  
> *server.4=zk4:2888:3888:participant;0.0.0.0:2181*{color}
> *But in this case, the dynamic configuration file will never change 
> automatically unless you manually revise it.*
>   *I think this is a very common case which may happen at any time. If 
> so, how can we handle with it?*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [jira] [Commented] (ZOOKEEPER-3556) Dynamic configuration file can not be updated automatically after some zookeeper servers of zk cluster are down

Posted by Alexander Shraer <sh...@gmail.com>.

exactly, thank you Michael :)

On Wed, Sep 25, 2019 at 9:32 PM Michael Han <ha...@apache.org> wrote:

> >> There were recently a post here from someone who has implemented this
>
> Maybe this one?
>
> http://zookeeper-user.578899.n2.nabble.com/About-ZooKeeper-Dynamic-Reconfiguration-td7584271.html
>
> On Wed, Sep 25, 2019 at 9:19 PM Alexander Shraer <sh...@gmail.com>
> wrote:
>
> > There were recently a post here from someone who has implemented this,
> but
> > I couldn't find it for some reason.
> >
> > Essentially I think that you'd need to monitor the "health" and
> > connectivity of servers to the leader, and issue reconfig commands to
> > remove them when you suspect that they're down or add them back when you
> > think they're up.
> > Notice that you always have to have at least a quorum of the ensemble, so
> > issuing a reconfig command if a quorum is lost (or any other command)
> won't
> > work.
> > You could use the information exposed in ZK's 4 letter commands to decide
> > whether you think a server is up and connected to the quorum or down.
> > Ideally we could also use the leader's view on who is connected
> > but it doesn't look like this is being exposed right now. You can also
> > periodically issue test read/write operations on various servers to check
> > if they're really operational
> >
> >
> https://github.com/apache/zookeeper/blob/1ca627b5a3105d80ed4d851c6e9f1a1e2ac7d64a/zookeeper-docs/src/main/resources/markdown/zookeeperAdmin.md#sc_4lw
> >
> > As accurate failure detection is impossible in async. systems, you'll
> need
> > to decide how sensitive you are to potential failures vs false
> suspicions.
> >
> > Hope this helps...
> >
> > Alex
> >
> > On Wed, Sep 25, 2019 at 6:00 PM Gao,Wei <We...@arcserve.com> wrote:
> >
> > > Hi Alexander Shraer,
> > >  Could you please tell me how to implement automation on top?
> > > Thank you very much!
> > >
> > > -----Original Message-----
> > > From: Alexander Shraer (Jira) <ji...@apache.org>
> > > Sent: Thursday, September 26, 2019 1:27 AM
> > > To: issues@zookeeper.apache.org
> > > Subject: [jira] [Commented] (ZOOKEEPER-3556) Dynamic configuration file
> > > can not be updated automatically after some zookeeper servers of zk
> > cluster
> > > are down
> > >
> > >
> > >     [
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ZOOKEEPER-2D3556-3Fpage-3Dcom.atlassian.jira.plugin.system.issuetabpanels-3Acomment-2Dtabpanel-26focusedCommentId-3D16937925-23comment-2D16937925&d=DwIFaQ&c=ZmK7amRlbztwfC_NTU_hNw&r=bTmnMF5RGYcfg4qOcKQAYjkGGUtOB2jR22ryrk8hNWk&m=UNFnO3kfjtUL8Jievmh9VMXf_nTLKBCfuJsaxe6FshU&s=XxgusqUbHgFrxTfTTcYuxMWxol3W-1dJ7WVzUqh1HAE&e=
> > > ]
> > >
> > > Alexander Shraer commented on ZOOKEEPER-3556:
> > > ---------------------------------------------
> > >
> > > The described behavior is not a bug – currently reconfiguration
> requires
> > > explicit action by an operator. One could implement automation on top.
> We
> > > should consider this as a feature, since it sounds like several
> adopters
> > > have implemented such automation. Perhaps one of them could contribute
> > this
> > > upstream.
> > >
> > > > Dynamic configuration file can not be updated automatically after
> some
> > > > zookeeper servers of zk cluster are down
> > > >
> ----------------------------------------------------------------------
> > > > -----------------------------------------
> > > >
> > > >                 Key: ZOOKEEPER-3556
> > > >                 URL:
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ZOOKEEPER-2D3556&d=DwIFaQ&c=ZmK7amRlbztwfC_NTU_hNw&r=bTmnMF5RGYcfg4qOcKQAYjkGGUtOB2jR22ryrk8hNWk&m=UNFnO3kfjtUL8Jievmh9VMXf_nTLKBCfuJsaxe6FshU&s=NQvX26JbBDNMmEtQhirmYk7ELe46vCjn4kbm1VqcNsA&e=
> > > >             Project: ZooKeeper
> > > >          Issue Type: Wish
> > > >          Components: java client
> > > >    Affects Versions: 3.5.5
> > > >            Reporter: Steven Chan
> > > >            Priority: Major
> > > >   Original Estimate: 12h
> > > >  Remaining Estimate: 12h
> > > >
> > > > *I encountered a problem which blocks my development of load balance
> > > > using ZooKeeper 3.5.5.*
> > > >    *Actually, I have a ZooKeeper cluster which comprises of five zk
> > > > servers. And the dynamic configuration file is as follows:*
> > > >  **
> > > > {color:#FF0000}
> > > > *server.1=zk1:2888:3888:participant;0.0.0.0:2181*{color}
> > > > {color:#FF0000}
> > > > *server.2=zk2:2888:3888:participant;0.0.0.0:2181*{color}
> > > > {color:#FF0000}
> > > > *server.3=zk3:2888:3888:participant;0.0.0.0:2181*{color}
> > > > {color:#FF0000}
> > > > *server.4=zk4:2888:3888:participant;0.0.0.0:2181*{color}
> > > > {color:#FF0000}
> > > > *server.5=zk5:2888:3888:participant;0.0.0.0:2181*{color}
> > > >  **
> > > >   *The zk cluster can work fine if every member works normally.
> > > > However, if say two of them are suddenly down without previously
> being
> > > > notified,* *the dynamic configuration file shown above will not be
> > > > synchronized dynamically, which leads to the zk cluster fail to work
> > > > normally.*
> > > >   *As far as I am concerned, the dynamic configuration file should be
> > > > modified to this if server 1 and server 5 are down suddenly as
> > > > follows:* {color:#FF0000}
> > > > *server.2=zk2:2888:3888:participant;0.0.0.0:2181*{color}
> > > > {color:#FF0000}
> > > > *server.3=zk3:2888:3888:participant;0.0.0.0:2181*{color}
> > > > {color:#FF0000}
> > > > *server.4=zk4:2888:3888:participant;0.0.0.0:2181*{color}
> > > > *But in this case, the dynamic configuration file will never change
> > > > automatically unless you manually revise it.*
> > > >   *I think this is a very common case which may happen at any time.
> If
> > > > so, how can we handle with it?*
> > >
> > >
> > >
> > > --
> > > This message was sent by Atlassian Jira
> > > (v8.3.4#803005)
> > >
> >
>

RE: [jira] [Commented] (ZOOKEEPER-3556) Dynamic configuration file can not be updated automatically after some zookeeper servers of zk cluster are down

Posted by "Gao,Wei" <We...@Arcserve.com>.

Hi Michael Han,
Thank you so much for your reply!

From: Michael Han <ha...@apache.org>
Sent: Thursday, September 26, 2019 12:32 PM
To: dev@zookeeper.apache.org
Cc: issues@zookeeper.apache.org
Subject: Re: [jira] [Commented] (ZOOKEEPER-3556) Dynamic configuration file can not be updated automatically after some zookeeper servers of zk cluster are down

>> There were recently a post here from someone who has implemented this

Maybe this one?
http://zookeeper-user.578899.n2.nabble.com/About-ZooKeeper-Dynamic-Reconfiguration-td7584271.html<https://urldefense.proofpoint.com/v2/url?u=http-3A__zookeeper-2Duser.578899.n2.nabble.com_About-2DZooKeeper-2DDynamic-2DReconfiguration-2Dtd7584271.html&d=DwMFaQ&c=ZmK7amRlbztwfC_NTU_hNw&r=bTmnMF5RGYcfg4qOcKQAYjkGGUtOB2jR22ryrk8hNWk&m=AI5IERmvZlG2tquJIwsiKolVjhLIwOTZz9UEn5IXPC0&s=WlNO36CvbhfEtBxMI-6cg7aX_OxB93fmoGwJSc463Hw&e=>

On Wed, Sep 25, 2019 at 9:19 PM Alexander Shraer <sh...@gmail.com>> wrote:
There were recently a post here from someone who has implemented this, but
I couldn't find it for some reason.

Essentially I think that you'd need to monitor the "health" and
connectivity of servers to the leader, and issue reconfig commands to
remove them when you suspect that they're down or add them back when you
think they're up.
Notice that you always have to have at least a quorum of the ensemble, so
issuing a reconfig command if a quorum is lost (or any other command) won't
work.
You could use the information exposed in ZK's 4 letter commands to decide
whether you think a server is up and connected to the quorum or down.
Ideally we could also use the leader's view on who is connected
but it doesn't look like this is being exposed right now. You can also
periodically issue test read/write operations on various servers to check
if they're really operational
https://github.com/apache/zookeeper/blob/1ca627b5a3105d80ed4d851c6e9f1a1e2ac7d64a/zookeeper-docs/src/main/resources/markdown/zookeeperAdmin.md#sc_4lw<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_zookeeper_blob_1ca627b5a3105d80ed4d851c6e9f1a1e2ac7d64a_zookeeper-2Ddocs_src_main_resources_markdown_zookeeperAdmin.md-23sc-5F4lw&d=DwMFaQ&c=ZmK7amRlbztwfC_NTU_hNw&r=bTmnMF5RGYcfg4qOcKQAYjkGGUtOB2jR22ryrk8hNWk&m=AI5IERmvZlG2tquJIwsiKolVjhLIwOTZz9UEn5IXPC0&s=1oFX4JmrAj3hY-uKNdeQHhflajJ09LOF_w6INraQKSc&e=>

As accurate failure detection is impossible in async. systems, you'll need
to decide how sensitive you are to potential failures vs false suspicions.

Hope this helps...

Alex

On Wed, Sep 25, 2019 at 6:00 PM Gao,Wei <We...@arcserve.com>> wrote:

> Hi Alexander Shraer,
>  Could you please tell me how to implement automation on top?
> Thank you very much!
>
> -----Original Message-----
> From: Alexander Shraer (Jira) <ji...@apache.org>>
> Sent: Thursday, September 26, 2019 1:27 AM
> To: issues@zookeeper.apache.org<ma...@zookeeper.apache.org>
> Subject: [jira] [Commented] (ZOOKEEPER-3556) Dynamic configuration file
> can not be updated automatically after some zookeeper servers of zk cluster
> are down
>
>
>     [
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ZOOKEEPER-2D3556-3Fpage-3Dcom.atlassian.jira.plugin.system.issuetabpanels-3Acomment-2Dtabpanel-26focusedCommentId-3D16937925-23comment-2D16937925&d=DwIFaQ&c=ZmK7amRlbztwfC_NTU_hNw&r=bTmnMF5RGYcfg4qOcKQAYjkGGUtOB2jR22ryrk8hNWk&m=UNFnO3kfjtUL8Jievmh9VMXf_nTLKBCfuJsaxe6FshU&s=XxgusqUbHgFrxTfTTcYuxMWxol3W-1dJ7WVzUqh1HAE&e=
> ]
>
> Alexander Shraer commented on ZOOKEEPER-3556:
> ---------------------------------------------
>
> The described behavior is not a bug – currently reconfiguration requires
> explicit action by an operator. One could implement automation on top. We
> should consider this as a feature, since it sounds like several adopters
> have implemented such automation. Perhaps one of them could contribute this
> upstream.
>
> > Dynamic configuration file can not be updated automatically after some
> > zookeeper servers of zk cluster are down
> > ----------------------------------------------------------------------
> > -----------------------------------------
> >
> >                 Key: ZOOKEEPER-3556
> >                 URL:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ZOOKEEPER-2D3556&d=DwIFaQ&c=ZmK7amRlbztwfC_NTU_hNw&r=bTmnMF5RGYcfg4qOcKQAYjkGGUtOB2jR22ryrk8hNWk&m=UNFnO3kfjtUL8Jievmh9VMXf_nTLKBCfuJsaxe6FshU&s=NQvX26JbBDNMmEtQhirmYk7ELe46vCjn4kbm1VqcNsA&e=
> >             Project: ZooKeeper
> >          Issue Type: Wish
> >          Components: java client
> >    Affects Versions: 3.5.5
> >            Reporter: Steven Chan
> >            Priority: Major
> >   Original Estimate: 12h
> >  Remaining Estimate: 12h
> >
> > *I encountered a problem which blocks my development of load balance
> > using ZooKeeper 3.5.5.*
> >    *Actually, I have a ZooKeeper cluster which comprises of five zk
> > servers. And the dynamic configuration file is as follows:*
> >  **
> > {color:#FF0000}
> > *server.1=zk1:2888:3888:participant;0.0.0.0:2181*{color}
> > {color:#FF0000}
> > *server.2=zk2:2888:3888:participant;0.0.0.0:2181*{color}
> > {color:#FF0000}
> > *server.3=zk3:2888:3888:participant;0.0.0.0:2181*{color}
> > {color:#FF0000}
> > *server.4=zk4:2888:3888:participant;0.0.0.0:2181*{color}
> > {color:#FF0000}
> > *server.5=zk5:2888:3888:participant;0.0.0.0:2181*{color}
> >  **
> >   *The zk cluster can work fine if every member works normally.
> > However, if say two of them are suddenly down without previously being
> > notified,* *the dynamic configuration file shown above will not be
> > synchronized dynamically, which leads to the zk cluster fail to work
> > normally.*
> >   *As far as I am concerned, the dynamic configuration file should be
> > modified to this if server 1 and server 5 are down suddenly as
> > follows:* {color:#FF0000}
> > *server.2=zk2:2888:3888:participant;0.0.0.0:2181*{color}
> > {color:#FF0000}
> > *server.3=zk3:2888:3888:participant;0.0.0.0:2181*{color}
> > {color:#FF0000}
> > *server.4=zk4:2888:3888:participant;0.0.0.0:2181*{color}
> > *But in this case, the dynamic configuration file will never change
> > automatically unless you manually revise it.*
> >   *I think this is a very common case which may happen at any time. If
> > so, how can we handle with it?*
>
>
>
> --
> This message was sent by Atlassian Jira
> (v8.3.4#803005)
>

Re: [jira] [Commented] (ZOOKEEPER-3556) Dynamic configuration file can not be updated automatically after some zookeeper servers of zk cluster are down

Posted by Michael Han <ha...@apache.org>.

>> There were recently a post here from someone who has implemented this

Maybe this one?
http://zookeeper-user.578899.n2.nabble.com/About-ZooKeeper-Dynamic-Reconfiguration-td7584271.html

On Wed, Sep 25, 2019 at 9:19 PM Alexander Shraer <sh...@gmail.com> wrote:

> There were recently a post here from someone who has implemented this, but
> I couldn't find it for some reason.
>
> Essentially I think that you'd need to monitor the "health" and
> connectivity of servers to the leader, and issue reconfig commands to
> remove them when you suspect that they're down or add them back when you
> think they're up.
> Notice that you always have to have at least a quorum of the ensemble, so
> issuing a reconfig command if a quorum is lost (or any other command) won't
> work.
> You could use the information exposed in ZK's 4 letter commands to decide
> whether you think a server is up and connected to the quorum or down.
> Ideally we could also use the leader's view on who is connected
> but it doesn't look like this is being exposed right now. You can also
> periodically issue test read/write operations on various servers to check
> if they're really operational
>
> https://github.com/apache/zookeeper/blob/1ca627b5a3105d80ed4d851c6e9f1a1e2ac7d64a/zookeeper-docs/src/main/resources/markdown/zookeeperAdmin.md#sc_4lw
>
> As accurate failure detection is impossible in async. systems, you'll need
> to decide how sensitive you are to potential failures vs false suspicions.
>
> Hope this helps...
>
> Alex
>
> On Wed, Sep 25, 2019 at 6:00 PM Gao,Wei <We...@arcserve.com> wrote:
>
> > Hi Alexander Shraer,
> >  Could you please tell me how to implement automation on top?
> > Thank you very much!
> >
> > -----Original Message-----
> > From: Alexander Shraer (Jira) <ji...@apache.org>
> > Sent: Thursday, September 26, 2019 1:27 AM
> > To: issues@zookeeper.apache.org
> > Subject: [jira] [Commented] (ZOOKEEPER-3556) Dynamic configuration file
> > can not be updated automatically after some zookeeper servers of zk
> cluster
> > are down
> >
> >
> >     [
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ZOOKEEPER-2D3556-3Fpage-3Dcom.atlassian.jira.plugin.system.issuetabpanels-3Acomment-2Dtabpanel-26focusedCommentId-3D16937925-23comment-2D16937925&d=DwIFaQ&c=ZmK7amRlbztwfC_NTU_hNw&r=bTmnMF5RGYcfg4qOcKQAYjkGGUtOB2jR22ryrk8hNWk&m=UNFnO3kfjtUL8Jievmh9VMXf_nTLKBCfuJsaxe6FshU&s=XxgusqUbHgFrxTfTTcYuxMWxol3W-1dJ7WVzUqh1HAE&e=
> > ]
> >
> > Alexander Shraer commented on ZOOKEEPER-3556:
> > ---------------------------------------------
> >
> > The described behavior is not a bug – currently reconfiguration requires
> > explicit action by an operator. One could implement automation on top. We
> > should consider this as a feature, since it sounds like several adopters
> > have implemented such automation. Perhaps one of them could contribute
> this
> > upstream.
> >
> > > Dynamic configuration file can not be updated automatically after some
> > > zookeeper servers of zk cluster are down
> > > ----------------------------------------------------------------------
> > > -----------------------------------------
> > >
> > >                 Key: ZOOKEEPER-3556
> > >                 URL:
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ZOOKEEPER-2D3556&d=DwIFaQ&c=ZmK7amRlbztwfC_NTU_hNw&r=bTmnMF5RGYcfg4qOcKQAYjkGGUtOB2jR22ryrk8hNWk&m=UNFnO3kfjtUL8Jievmh9VMXf_nTLKBCfuJsaxe6FshU&s=NQvX26JbBDNMmEtQhirmYk7ELe46vCjn4kbm1VqcNsA&e=
> > >             Project: ZooKeeper
> > >          Issue Type: Wish
> > >          Components: java client
> > >    Affects Versions: 3.5.5
> > >            Reporter: Steven Chan
> > >            Priority: Major
> > >   Original Estimate: 12h
> > >  Remaining Estimate: 12h
> > >
> > > *I encountered a problem which blocks my development of load balance
> > > using ZooKeeper 3.5.5.*
> > >    *Actually, I have a ZooKeeper cluster which comprises of five zk
> > > servers. And the dynamic configuration file is as follows:*
> > >  **
> > > {color:#FF0000}
> > > *server.1=zk1:2888:3888:participant;0.0.0.0:2181*{color}
> > > {color:#FF0000}
> > > *server.2=zk2:2888:3888:participant;0.0.0.0:2181*{color}
> > > {color:#FF0000}
> > > *server.3=zk3:2888:3888:participant;0.0.0.0:2181*{color}
> > > {color:#FF0000}
> > > *server.4=zk4:2888:3888:participant;0.0.0.0:2181*{color}
> > > {color:#FF0000}
> > > *server.5=zk5:2888:3888:participant;0.0.0.0:2181*{color}
> > >  **
> > >   *The zk cluster can work fine if every member works normally.
> > > However, if say two of them are suddenly down without previously being
> > > notified,* *the dynamic configuration file shown above will not be
> > > synchronized dynamically, which leads to the zk cluster fail to work
> > > normally.*
> > >   *As far as I am concerned, the dynamic configuration file should be
> > > modified to this if server 1 and server 5 are down suddenly as
> > > follows:* {color:#FF0000}
> > > *server.2=zk2:2888:3888:participant;0.0.0.0:2181*{color}
> > > {color:#FF0000}
> > > *server.3=zk3:2888:3888:participant;0.0.0.0:2181*{color}
> > > {color:#FF0000}
> > > *server.4=zk4:2888:3888:participant;0.0.0.0:2181*{color}
> > > *But in this case, the dynamic configuration file will never change
> > > automatically unless you manually revise it.*
> > >   *I think this is a very common case which may happen at any time. If
> > > so, how can we handle with it?*
> >
> >
> >
> > --
> > This message was sent by Atlassian Jira
> > (v8.3.4#803005)
> >
>

Re: [jira] [Commented] (ZOOKEEPER-3556) Dynamic configuration file can not be updated automatically after some zookeeper servers of zk cluster are down

Posted by Michael Han <ha...@apache.org>.

>> There were recently a post here from someone who has implemented this

Maybe this one?
http://zookeeper-user.578899.n2.nabble.com/About-ZooKeeper-Dynamic-Reconfiguration-td7584271.html

On Wed, Sep 25, 2019 at 9:19 PM Alexander Shraer <sh...@gmail.com> wrote:

> There were recently a post here from someone who has implemented this, but
> I couldn't find it for some reason.
>
> Essentially I think that you'd need to monitor the "health" and
> connectivity of servers to the leader, and issue reconfig commands to
> remove them when you suspect that they're down or add them back when you
> think they're up.
> Notice that you always have to have at least a quorum of the ensemble, so
> issuing a reconfig command if a quorum is lost (or any other command) won't
> work.
> You could use the information exposed in ZK's 4 letter commands to decide
> whether you think a server is up and connected to the quorum or down.
> Ideally we could also use the leader's view on who is connected
> but it doesn't look like this is being exposed right now. You can also
> periodically issue test read/write operations on various servers to check
> if they're really operational
>
> https://github.com/apache/zookeeper/blob/1ca627b5a3105d80ed4d851c6e9f1a1e2ac7d64a/zookeeper-docs/src/main/resources/markdown/zookeeperAdmin.md#sc_4lw
>
> As accurate failure detection is impossible in async. systems, you'll need
> to decide how sensitive you are to potential failures vs false suspicions.
>
> Hope this helps...
>
> Alex
>
> On Wed, Sep 25, 2019 at 6:00 PM Gao,Wei <We...@arcserve.com> wrote:
>
> > Hi Alexander Shraer,
> >  Could you please tell me how to implement automation on top?
> > Thank you very much!
> >
> > -----Original Message-----
> > From: Alexander Shraer (Jira) <ji...@apache.org>
> > Sent: Thursday, September 26, 2019 1:27 AM
> > To: issues@zookeeper.apache.org
> > Subject: [jira] [Commented] (ZOOKEEPER-3556) Dynamic configuration file
> > can not be updated automatically after some zookeeper servers of zk
> cluster
> > are down
> >
> >
> >     [
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ZOOKEEPER-2D3556-3Fpage-3Dcom.atlassian.jira.plugin.system.issuetabpanels-3Acomment-2Dtabpanel-26focusedCommentId-3D16937925-23comment-2D16937925&d=DwIFaQ&c=ZmK7amRlbztwfC_NTU_hNw&r=bTmnMF5RGYcfg4qOcKQAYjkGGUtOB2jR22ryrk8hNWk&m=UNFnO3kfjtUL8Jievmh9VMXf_nTLKBCfuJsaxe6FshU&s=XxgusqUbHgFrxTfTTcYuxMWxol3W-1dJ7WVzUqh1HAE&e=
> > ]
> >
> > Alexander Shraer commented on ZOOKEEPER-3556:
> > ---------------------------------------------
> >
> > The described behavior is not a bug – currently reconfiguration requires
> > explicit action by an operator. One could implement automation on top. We
> > should consider this as a feature, since it sounds like several adopters
> > have implemented such automation. Perhaps one of them could contribute
> this
> > upstream.
> >
> > > Dynamic configuration file can not be updated automatically after some
> > > zookeeper servers of zk cluster are down
> > > ----------------------------------------------------------------------
> > > -----------------------------------------
> > >
> > >                 Key: ZOOKEEPER-3556
> > >                 URL:
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ZOOKEEPER-2D3556&d=DwIFaQ&c=ZmK7amRlbztwfC_NTU_hNw&r=bTmnMF5RGYcfg4qOcKQAYjkGGUtOB2jR22ryrk8hNWk&m=UNFnO3kfjtUL8Jievmh9VMXf_nTLKBCfuJsaxe6FshU&s=NQvX26JbBDNMmEtQhirmYk7ELe46vCjn4kbm1VqcNsA&e=
> > >             Project: ZooKeeper
> > >          Issue Type: Wish
> > >          Components: java client
> > >    Affects Versions: 3.5.5
> > >            Reporter: Steven Chan
> > >            Priority: Major
> > >   Original Estimate: 12h
> > >  Remaining Estimate: 12h
> > >
> > > *I encountered a problem which blocks my development of load balance
> > > using ZooKeeper 3.5.5.*
> > >    *Actually, I have a ZooKeeper cluster which comprises of five zk
> > > servers. And the dynamic configuration file is as follows:*
> > >  **
> > > {color:#FF0000}
> > > *server.1=zk1:2888:3888:participant;0.0.0.0:2181*{color}
> > > {color:#FF0000}
> > > *server.2=zk2:2888:3888:participant;0.0.0.0:2181*{color}
> > > {color:#FF0000}
> > > *server.3=zk3:2888:3888:participant;0.0.0.0:2181*{color}
> > > {color:#FF0000}
> > > *server.4=zk4:2888:3888:participant;0.0.0.0:2181*{color}
> > > {color:#FF0000}
> > > *server.5=zk5:2888:3888:participant;0.0.0.0:2181*{color}
> > >  **
> > >   *The zk cluster can work fine if every member works normally.
> > > However, if say two of them are suddenly down without previously being
> > > notified,* *the dynamic configuration file shown above will not be
> > > synchronized dynamically, which leads to the zk cluster fail to work
> > > normally.*
> > >   *As far as I am concerned, the dynamic configuration file should be
> > > modified to this if server 1 and server 5 are down suddenly as
> > > follows:* {color:#FF0000}
> > > *server.2=zk2:2888:3888:participant;0.0.0.0:2181*{color}
> > > {color:#FF0000}
> > > *server.3=zk3:2888:3888:participant;0.0.0.0:2181*{color}
> > > {color:#FF0000}
> > > *server.4=zk4:2888:3888:participant;0.0.0.0:2181*{color}
> > > *But in this case, the dynamic configuration file will never change
> > > automatically unless you manually revise it.*
> > >   *I think this is a very common case which may happen at any time. If
> > > so, how can we handle with it?*
> >
> >
> >
> > --
> > This message was sent by Atlassian Jira
> > (v8.3.4#803005)
> >
>

Re: [jira] [Commented] (ZOOKEEPER-3556) Dynamic configuration file can not be updated automatically after some zookeeper servers of zk cluster are down

Posted by Alexander Shraer <sh...@gmail.com>.

There were recently a post here from someone who has implemented this, but
I couldn't find it for some reason.

Essentially I think that you'd need to monitor the "health" and
connectivity of servers to the leader, and issue reconfig commands to
remove them when you suspect that they're down or add them back when you
think they're up.
Notice that you always have to have at least a quorum of the ensemble, so
issuing a reconfig command if a quorum is lost (or any other command) won't
work.
You could use the information exposed in ZK's 4 letter commands to decide
whether you think a server is up and connected to the quorum or down.
Ideally we could also use the leader's view on who is connected
but it doesn't look like this is being exposed right now. You can also
periodically issue test read/write operations on various servers to check
if they're really operational
https://github.com/apache/zookeeper/blob/1ca627b5a3105d80ed4d851c6e9f1a1e2ac7d64a/zookeeper-docs/src/main/resources/markdown/zookeeperAdmin.md#sc_4lw

As accurate failure detection is impossible in async. systems, you'll need
to decide how sensitive you are to potential failures vs false suspicions.

Hope this helps...

Alex

On Wed, Sep 25, 2019 at 6:00 PM Gao,Wei <We...@arcserve.com> wrote:

> Hi Alexander Shraer,
>  Could you please tell me how to implement automation on top?
> Thank you very much!
>
> -----Original Message-----
> From: Alexander Shraer (Jira) <ji...@apache.org>
> Sent: Thursday, September 26, 2019 1:27 AM
> To: issues@zookeeper.apache.org
> Subject: [jira] [Commented] (ZOOKEEPER-3556) Dynamic configuration file
> can not be updated automatically after some zookeeper servers of zk cluster
> are down
>
>
>     [
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ZOOKEEPER-2D3556-3Fpage-3Dcom.atlassian.jira.plugin.system.issuetabpanels-3Acomment-2Dtabpanel-26focusedCommentId-3D16937925-23comment-2D16937925&d=DwIFaQ&c=ZmK7amRlbztwfC_NTU_hNw&r=bTmnMF5RGYcfg4qOcKQAYjkGGUtOB2jR22ryrk8hNWk&m=UNFnO3kfjtUL8Jievmh9VMXf_nTLKBCfuJsaxe6FshU&s=XxgusqUbHgFrxTfTTcYuxMWxol3W-1dJ7WVzUqh1HAE&e=
> ]
>
> Alexander Shraer commented on ZOOKEEPER-3556:
> ---------------------------------------------
>
> The described behavior is not a bug – currently reconfiguration requires
> explicit action by an operator. One could implement automation on top. We
> should consider this as a feature, since it sounds like several adopters
> have implemented such automation. Perhaps one of them could contribute this
> upstream.
>
> > Dynamic configuration file can not be updated automatically after some
> > zookeeper servers of zk cluster are down
> > ----------------------------------------------------------------------
> > -----------------------------------------
> >
> >                 Key: ZOOKEEPER-3556
> >                 URL:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ZOOKEEPER-2D3556&d=DwIFaQ&c=ZmK7amRlbztwfC_NTU_hNw&r=bTmnMF5RGYcfg4qOcKQAYjkGGUtOB2jR22ryrk8hNWk&m=UNFnO3kfjtUL8Jievmh9VMXf_nTLKBCfuJsaxe6FshU&s=NQvX26JbBDNMmEtQhirmYk7ELe46vCjn4kbm1VqcNsA&e=
> >             Project: ZooKeeper
> >          Issue Type: Wish
> >          Components: java client
> >    Affects Versions: 3.5.5
> >            Reporter: Steven Chan
> >            Priority: Major
> >   Original Estimate: 12h
> >  Remaining Estimate: 12h
> >
> > *I encountered a problem which blocks my development of load balance
> > using ZooKeeper 3.5.5.*
> >    *Actually, I have a ZooKeeper cluster which comprises of five zk
> > servers. And the dynamic configuration file is as follows:*
> >  **
> > {color:#FF0000}
> > *server.1=zk1:2888:3888:participant;0.0.0.0:2181*{color}
> > {color:#FF0000}
> > *server.2=zk2:2888:3888:participant;0.0.0.0:2181*{color}
> > {color:#FF0000}
> > *server.3=zk3:2888:3888:participant;0.0.0.0:2181*{color}
> > {color:#FF0000}
> > *server.4=zk4:2888:3888:participant;0.0.0.0:2181*{color}
> > {color:#FF0000}
> > *server.5=zk5:2888:3888:participant;0.0.0.0:2181*{color}
> >  **
> >   *The zk cluster can work fine if every member works normally.
> > However, if say two of them are suddenly down without previously being
> > notified,* *the dynamic configuration file shown above will not be
> > synchronized dynamically, which leads to the zk cluster fail to work
> > normally.*
> >   *As far as I am concerned, the dynamic configuration file should be
> > modified to this if server 1 and server 5 are down suddenly as
> > follows:* {color:#FF0000}
> > *server.2=zk2:2888:3888:participant;0.0.0.0:2181*{color}
> > {color:#FF0000}
> > *server.3=zk3:2888:3888:participant;0.0.0.0:2181*{color}
> > {color:#FF0000}
> > *server.4=zk4:2888:3888:participant;0.0.0.0:2181*{color}
> > *But in this case, the dynamic configuration file will never change
> > automatically unless you manually revise it.*
> >   *I think this is a very common case which may happen at any time. If
> > so, how can we handle with it?*
>
>
>
> --
> This message was sent by Atlassian Jira
> (v8.3.4#803005)
>