You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@servicecomb.apache.org by bismy <bi...@qq.com> on 2018/05/14 04:03:52 UTC

[Discussion]About service instances discovery reliable problems

Hi All,


Now we meet a reliable problem. When service center restarted, It will clear all service instances information. 
And when SDK(java-chassis) queries instance list periodically, it will get an empty list and invocation will fail.


In order to resolve this problem, two solutions is suggested:
1. service center provide instances persistence mechanism. When service center restarted, it will restore instance information, 
and re-calculate the timeout information(e.g. reset instance last active time to startup time). If he gets the heartbeat from instance, the instance will not be removed, and after timeout,
it can removed instances, like the normal way. 
 2. SDK need to take special care with instances remove. SDK don't actually remove instances when he gets empty list from service center and it will ping the instances. If ping return 
OK, the instance will not removed.


Known consequencies:
Solution 2:
  a. Conflicts with service center white/black rule. 
  b. In docker or some instances changed frequently scenario, the ip/port is reused by many services when service start/stop, and service health URL may also be the same. So it will give a lot of 400 like error when instances is not updated. 


Any suggestions?

回复： [Discussion]About service instances discovery reliable problems

Posted by bismy <bi...@qq.com>.

We have already done this @Yang Bo. But there are still some complicated scenarios. Firstly I addressed one of them is this PR: https://github.com/apache/incubator-servicecomb-java-chassis/pull/704






------------------ 原始邮件 ------------------
发件人: "Yang Bo"<oa...@gmail.com>;
发送时间: 2018年5月16日(星期三) 中午11:36
收件人: "dev"<de...@servicecomb.apache.org>;

主题: Re: [Discussion]About service instances discovery reliable problems



 We may do something like this:
Keep a copy of the instance/metadata information in clientside, and when
the SC is down, the client can still use the local information to visit
services.


On Wed, May 16, 2018 at 11:15 AM, Willem Jiang <wi...@gmail.com>
wrote:

> If we treat the service center as an online service, it should provide 7*24
> services.
> But if we use the standalone service center, it could be challenge for the
> service center provide 7*24 service.
>
> How can we setup the instance refresh strategy?
> We may need to provide different solution for different user case.
>
>
> Willem Jiang
>
> Blog: http://willemjiang.blogspot.com (English)
>           http://jnn.iteye.com  (Chinese)
> Twitter: willemjiang
> Weibo: 姜宁willem
>
> On Mon, May 14, 2018 at 4:52 PM, bismy <bi...@qq.com> wrote:
>
> > Supporting gray release need a lot of facilities to make it work and
> > service center upgrading can not apply gray release sometimes.
> > And other scenarios like standalone application(not cloud services)
> > restart is quite common. And base services restart can't influence user's
> > service communication.
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "wjm wjm"<zz...@gmail.com>;
> > 发送时间: 2018年5月14日(星期一) 下午4:23
> > 收件人: "dev@servicecomb.apache.org"<de...@servicecomb.apache.org>;
> >
> > 主题: Re: [Discussion]About service instances discovery reliable problems
> >
> >
> >
> > it's a problem, but why business use gray release, but we reject to the
> > solution?
> >
> > 2018年5月14日星期一，bismy <bi...@qq.com> 写道：
> >
> > > When service center all instances stoped and then started. This is
> normal
> > > when we are doing maintenance. e.g. upgrading
> > >
> > >
> > >
> > >
> > > ------------------ 原始邮件 ------------------
> > > 发件人: "wjm wjm"<zz...@gmail.com>;
> > > 发送时间: 2018年5月14日(星期一) 中午12:36
> > > 收件人: "dev"<de...@servicecomb.apache.org>;
> > >
> > > 主题: Re: [Discussion]About service instances discovery reliable problems
> > >
> > >
> > >
> > > " When service center restarted"
> > >
> > > that means one instance of SC cluster, or whole SC cluster?
> > > even one instance restart will clear all information?
> > >
> > > 2018-05-14 12:03 GMT+08:00 bismy <bi...@qq.com>:
> > >
> > > > Hi All,
> > > >
> > > >
> > > > Now we meet a reliable problem. When service center restarted, It
> will
> > > > clear all service instances information.
> > > > And when SDK(java-chassis) queries instance list periodically, it
> will
> > > get
> > > > an empty list and invocation will fail.
> > > >
> > > >
> > > > In order to resolve this problem, two solutions is suggested:
> > > > 1. service center provide instances persistence mechanism. When
> service
> > > > center restarted, it will restore instance information,
> > > > and re-calculate the timeout information(e.g. reset instance last
> > active
> > > > time to startup time). If he gets the heartbeat from instance, the
> > > instance
> > > > will not be removed, and after timeout,
> > > > it can removed instances, like the normal way.
> > > >  2. SDK need to take special care with instances remove. SDK don't
> > > > actually remove instances when he gets empty list from service center
> > and
> > > > it will ping the instances. If ping return
> > > > OK, the instance will not removed.
> > > >
> > > >
> > > > Known consequencies:
> > > > Solution 2:
> > > >   a. Conflicts with service center white/black rule.
> > > >   b. In docker or some instances changed frequently scenario, the
> > ip/port
> > > > is reused by many services when service start/stop, and service
> health
> > > URL
> > > > may also be the same. So it will give a lot of 400 like error when
> > > > instances is not updated.
> > > >
> > > >
> > > > Any suggestions?
> >
>



-- 
Best Regards,
Yang.

Re: [Discussion]About service instances discovery reliable problems

Posted by Yang Bo <oa...@gmail.com>.

 We may do something like this:
Keep a copy of the instance/metadata information in clientside, and when
the SC is down, the client can still use the local information to visit
services.


On Wed, May 16, 2018 at 11:15 AM, Willem Jiang <wi...@gmail.com>
wrote:

> If we treat the service center as an online service, it should provide 7*24
> services.
> But if we use the standalone service center, it could be challenge for the
> service center provide 7*24 service.
>
> How can we setup the instance refresh strategy?
> We may need to provide different solution for different user case.
>
>
> Willem Jiang
>
> Blog: http://willemjiang.blogspot.com (English)
>           http://jnn.iteye.com  (Chinese)
> Twitter: willemjiang
> Weibo: 姜宁willem
>
> On Mon, May 14, 2018 at 4:52 PM, bismy <bi...@qq.com> wrote:
>
> > Supporting gray release need a lot of facilities to make it work and
> > service center upgrading can not apply gray release sometimes.
> > And other scenarios like standalone application(not cloud services)
> > restart is quite common. And base services restart can't influence user's
> > service communication.
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "wjm wjm"<zz...@gmail.com>;
> > 发送时间: 2018年5月14日(星期一) 下午4:23
> > 收件人: "dev@servicecomb.apache.org"<de...@servicecomb.apache.org>;
> >
> > 主题: Re: [Discussion]About service instances discovery reliable problems
> >
> >
> >
> > it's a problem, but why business use gray release, but we reject to the
> > solution?
> >
> > 2018年5月14日星期一，bismy <bi...@qq.com> 写道：
> >
> > > When service center all instances stoped and then started. This is
> normal
> > > when we are doing maintenance. e.g. upgrading
> > >
> > >
> > >
> > >
> > > ------------------ 原始邮件 ------------------
> > > 发件人: "wjm wjm"<zz...@gmail.com>;
> > > 发送时间: 2018年5月14日(星期一) 中午12:36
> > > 收件人: "dev"<de...@servicecomb.apache.org>;
> > >
> > > 主题: Re: [Discussion]About service instances discovery reliable problems
> > >
> > >
> > >
> > > " When service center restarted"
> > >
> > > that means one instance of SC cluster, or whole SC cluster?
> > > even one instance restart will clear all information?
> > >
> > > 2018-05-14 12:03 GMT+08:00 bismy <bi...@qq.com>:
> > >
> > > > Hi All,
> > > >
> > > >
> > > > Now we meet a reliable problem. When service center restarted, It
> will
> > > > clear all service instances information.
> > > > And when SDK(java-chassis) queries instance list periodically, it
> will
> > > get
> > > > an empty list and invocation will fail.
> > > >
> > > >
> > > > In order to resolve this problem, two solutions is suggested:
> > > > 1. service center provide instances persistence mechanism. When
> service
> > > > center restarted, it will restore instance information,
> > > > and re-calculate the timeout information(e.g. reset instance last
> > active
> > > > time to startup time). If he gets the heartbeat from instance, the
> > > instance
> > > > will not be removed, and after timeout,
> > > > it can removed instances, like the normal way.
> > > >  2. SDK need to take special care with instances remove. SDK don't
> > > > actually remove instances when he gets empty list from service center
> > and
> > > > it will ping the instances. If ping return
> > > > OK, the instance will not removed.
> > > >
> > > >
> > > > Known consequencies:
> > > > Solution 2:
> > > >   a. Conflicts with service center white/black rule.
> > > >   b. In docker or some instances changed frequently scenario, the
> > ip/port
> > > > is reused by many services when service start/stop, and service
> health
> > > URL
> > > > may also be the same. So it will give a lot of 400 like error when
> > > > instances is not updated.
> > > >
> > > >
> > > > Any suggestions?
> >
>



-- 
Best Regards,
Yang.

Re: [Discussion]About service instances discovery reliable problems

Posted by Willem Jiang <wi...@gmail.com>.

If we treat the service center as an online service, it should provide 7*24
services.
But if we use the standalone service center, it could be challenge for the
service center provide 7*24 service.

How can we setup the instance refresh strategy?
We may need to provide different solution for different user case.


Willem Jiang

Blog: http://willemjiang.blogspot.com (English)
          http://jnn.iteye.com  (Chinese)
Twitter: willemjiang
Weibo: 姜宁willem

On Mon, May 14, 2018 at 4:52 PM, bismy <bi...@qq.com> wrote:

> Supporting gray release need a lot of facilities to make it work and
> service center upgrading can not apply gray release sometimes.
> And other scenarios like standalone application(not cloud services)
> restart is quite common. And base services restart can't influence user's
> service communication.
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "wjm wjm"<zz...@gmail.com>;
> 发送时间: 2018年5月14日(星期一) 下午4:23
> 收件人: "dev@servicecomb.apache.org"<de...@servicecomb.apache.org>;
>
> 主题: Re: [Discussion]About service instances discovery reliable problems
>
>
>
> it's a problem, but why business use gray release, but we reject to the
> solution?
>
> 2018年5月14日星期一，bismy <bi...@qq.com> 写道：
>
> > When service center all instances stoped and then started. This is normal
> > when we are doing maintenance. e.g. upgrading
> >
> >
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "wjm wjm"<zz...@gmail.com>;
> > 发送时间: 2018年5月14日(星期一) 中午12:36
> > 收件人: "dev"<de...@servicecomb.apache.org>;
> >
> > 主题: Re: [Discussion]About service instances discovery reliable problems
> >
> >
> >
> > " When service center restarted"
> >
> > that means one instance of SC cluster, or whole SC cluster?
> > even one instance restart will clear all information?
> >
> > 2018-05-14 12:03 GMT+08:00 bismy <bi...@qq.com>:
> >
> > > Hi All,
> > >
> > >
> > > Now we meet a reliable problem. When service center restarted, It will
> > > clear all service instances information.
> > > And when SDK(java-chassis) queries instance list periodically, it will
> > get
> > > an empty list and invocation will fail.
> > >
> > >
> > > In order to resolve this problem, two solutions is suggested:
> > > 1. service center provide instances persistence mechanism. When service
> > > center restarted, it will restore instance information,
> > > and re-calculate the timeout information(e.g. reset instance last
> active
> > > time to startup time). If he gets the heartbeat from instance, the
> > instance
> > > will not be removed, and after timeout,
> > > it can removed instances, like the normal way.
> > >  2. SDK need to take special care with instances remove. SDK don't
> > > actually remove instances when he gets empty list from service center
> and
> > > it will ping the instances. If ping return
> > > OK, the instance will not removed.
> > >
> > >
> > > Known consequencies:
> > > Solution 2:
> > >   a. Conflicts with service center white/black rule.
> > >   b. In docker or some instances changed frequently scenario, the
> ip/port
> > > is reused by many services when service start/stop, and service health
> > URL
> > > may also be the same. So it will give a lot of 400 like error when
> > > instances is not updated.
> > >
> > >
> > > Any suggestions?
>

回复： [Discussion]About service instances discovery reliable problems

Posted by bismy <bi...@qq.com>.

Supporting gray release need a lot of facilities to make it work and service center upgrading can not apply gray release sometimes.
And other scenarios like standalone application(not cloud services) restart is quite common. And base services restart can't influence user's service communication.


------------------ 原始邮件 ------------------
发件人: "wjm wjm"<zz...@gmail.com>;
发送时间: 2018年5月14日(星期一) 下午4:23
收件人: "dev@servicecomb.apache.org"<de...@servicecomb.apache.org>;

主题: Re: [Discussion]About service instances discovery reliable problems



it's a problem, but why business use gray release, but we reject to the
solution?

2018年5月14日星期一，bismy <bi...@qq.com> 写道：

> When service center all instances stoped and then started. This is normal
> when we are doing maintenance. e.g. upgrading
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "wjm wjm"<zz...@gmail.com>;
> 发送时间: 2018年5月14日(星期一) 中午12:36
> 收件人: "dev"<de...@servicecomb.apache.org>;
>
> 主题: Re: [Discussion]About service instances discovery reliable problems
>
>
>
> " When service center restarted"
>
> that means one instance of SC cluster, or whole SC cluster?
> even one instance restart will clear all information?
>
> 2018-05-14 12:03 GMT+08:00 bismy <bi...@qq.com>:
>
> > Hi All,
> >
> >
> > Now we meet a reliable problem. When service center restarted, It will
> > clear all service instances information.
> > And when SDK(java-chassis) queries instance list periodically, it will
> get
> > an empty list and invocation will fail.
> >
> >
> > In order to resolve this problem, two solutions is suggested:
> > 1. service center provide instances persistence mechanism. When service
> > center restarted, it will restore instance information,
> > and re-calculate the timeout information(e.g. reset instance last active
> > time to startup time). If he gets the heartbeat from instance, the
> instance
> > will not be removed, and after timeout,
> > it can removed instances, like the normal way.
> >  2. SDK need to take special care with instances remove. SDK don't
> > actually remove instances when he gets empty list from service center and
> > it will ping the instances. If ping return
> > OK, the instance will not removed.
> >
> >
> > Known consequencies:
> > Solution 2:
> >   a. Conflicts with service center white/black rule.
> >   b. In docker or some instances changed frequently scenario, the ip/port
> > is reused by many services when service start/stop, and service health
> URL
> > may also be the same. So it will give a lot of 400 like error when
> > instances is not updated.
> >
> >
> > Any suggestions?

Re: [Discussion]About service instances discovery reliable problems

Posted by wjm wjm <zz...@gmail.com>.

it's a problem, but why business use gray release, but we reject to the
solution?

2018年5月14日星期一，bismy <bi...@qq.com> 写道：

> When service center all instances stoped and then started. This is normal
> when we are doing maintenance. e.g. upgrading
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "wjm wjm"<zz...@gmail.com>;
> 发送时间: 2018年5月14日(星期一) 中午12:36
> 收件人: "dev"<de...@servicecomb.apache.org>;
>
> 主题: Re: [Discussion]About service instances discovery reliable problems
>
>
>
> " When service center restarted"
>
> that means one instance of SC cluster, or whole SC cluster?
> even one instance restart will clear all information?
>
> 2018-05-14 12:03 GMT+08:00 bismy <bi...@qq.com>:
>
> > Hi All,
> >
> >
> > Now we meet a reliable problem. When service center restarted, It will
> > clear all service instances information.
> > And when SDK(java-chassis) queries instance list periodically, it will
> get
> > an empty list and invocation will fail.
> >
> >
> > In order to resolve this problem, two solutions is suggested:
> > 1. service center provide instances persistence mechanism. When service
> > center restarted, it will restore instance information,
> > and re-calculate the timeout information(e.g. reset instance last active
> > time to startup time). If he gets the heartbeat from instance, the
> instance
> > will not be removed, and after timeout,
> > it can removed instances, like the normal way.
> >  2. SDK need to take special care with instances remove. SDK don't
> > actually remove instances when he gets empty list from service center and
> > it will ping the instances. If ping return
> > OK, the instance will not removed.
> >
> >
> > Known consequencies:
> > Solution 2:
> >   a. Conflicts with service center white/black rule.
> >   b. In docker or some instances changed frequently scenario, the ip/port
> > is reused by many services when service start/stop, and service health
> URL
> > may also be the same. So it will give a lot of 400 like error when
> > instances is not updated.
> >
> >
> > Any suggestions?

回复： [Discussion]About service instances discovery reliable problems

Posted by bismy <bi...@qq.com>.

When service center all instances stoped and then started. This is normal when we are doing maintenance. e.g. upgrading




------------------ 原始邮件 ------------------
发件人: "wjm wjm"<zz...@gmail.com>;
发送时间: 2018年5月14日(星期一) 中午12:36
收件人: "dev"<de...@servicecomb.apache.org>;

主题: Re: [Discussion]About service instances discovery reliable problems



" When service center restarted"

that means one instance of SC cluster, or whole SC cluster?
even one instance restart will clear all information?

2018-05-14 12:03 GMT+08:00 bismy <bi...@qq.com>:

> Hi All,
>
>
> Now we meet a reliable problem. When service center restarted, It will
> clear all service instances information.
> And when SDK(java-chassis) queries instance list periodically, it will get
> an empty list and invocation will fail.
>
>
> In order to resolve this problem, two solutions is suggested:
> 1. service center provide instances persistence mechanism. When service
> center restarted, it will restore instance information,
> and re-calculate the timeout information(e.g. reset instance last active
> time to startup time). If he gets the heartbeat from instance, the instance
> will not be removed, and after timeout,
> it can removed instances, like the normal way.
>  2. SDK need to take special care with instances remove. SDK don't
> actually remove instances when he gets empty list from service center and
> it will ping the instances. If ping return
> OK, the instance will not removed.
>
>
> Known consequencies:
> Solution 2:
>   a. Conflicts with service center white/black rule.
>   b. In docker or some instances changed frequently scenario, the ip/port
> is reused by many services when service start/stop, and service health URL
> may also be the same. So it will give a lot of 400 like error when
> instances is not updated.
>
>
> Any suggestions?

Re: [Discussion]About service instances discovery reliable problems

Posted by wjm wjm <zz...@gmail.com>.

" When service center restarted"

that means one instance of SC cluster, or whole SC cluster?
even one instance restart will clear all information?

2018-05-14 12:03 GMT+08:00 bismy <bi...@qq.com>:

> Hi All,
>
>
> Now we meet a reliable problem. When service center restarted, It will
> clear all service instances information.
> And when SDK(java-chassis) queries instance list periodically, it will get
> an empty list and invocation will fail.
>
>
> In order to resolve this problem, two solutions is suggested:
> 1. service center provide instances persistence mechanism. When service
> center restarted, it will restore instance information,
> and re-calculate the timeout information(e.g. reset instance last active
> time to startup time). If he gets the heartbeat from instance, the instance
> will not be removed, and after timeout,
> it can removed instances, like the normal way.
>  2. SDK need to take special care with instances remove. SDK don't
> actually remove instances when he gets empty list from service center and
> it will ping the instances. If ping return
> OK, the instance will not removed.
>
>
> Known consequencies:
> Solution 2:
>   a. Conflicts with service center white/black rule.
>   b. In docker or some instances changed frequently scenario, the ip/port
> is reused by many services when service start/stop, and service health URL
> may also be the same. So it will give a lot of 400 like error when
> instances is not updated.
>
>
> Any suggestions?