You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cloudstack.apache.org by Sean Lair <sl...@ippathways.com> on 2017/04/10 21:53:40 UTC

How are router checks scheduled?

According to my management server logs, some of the period checks are getting kicked off twice at the same time.  The CheckRouterTask is kicked off every 30-seconds, but each time it is ran, it is ran twice at the same second...  See logs below for example:

2017-04-10 21:48:12,879 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] (RouterStatusMonitor-1:ctx-5f7bc584) (logid:4d5b1031) Found 10 routers to update status.
2017-04-10 21:48:12,932 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] (RouterStatusMonitor-1:ctx-d027ab6f) (logid:1bc50629) Found 10 routers to update status.
2017-04-10 21:48:42,877 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] (RouterStatusMonitor-1:ctx-2c8f4d18) (logid:e9111785) Found 10 routers to update status.
2017-04-10 21:48:42,927 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] (RouterStatusMonitor-1:ctx-1bfd5351) (logid:ad0f95ef) Found 10 routers to update status.
2017-04-10 21:49:12,874 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] (RouterStatusMonitor-1:ctx-ede0d2bb) (logid:6f244423) Found 10 routers to update status.
2017-04-10 21:49:12,928 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] (RouterStatusMonitor-1:ctx-d58842d5) (logid:8442d73c) Found 10 routers to update status.

How is this scheduled/kicked off?  I am debugging some site-to-site VPN alert problems, and they seem to be related to a race condition due to the "CheckRouterTask" be kicked off two at a time.

Thanks
Sean




RE: How are router checks scheduled?

Posted by Sean Lair <sl...@ippathways.com>.
Yep! Exactly, we have that issue too.  I am testing a possible fix right now, I'll let you know how it goes!


-----Original Message-----
From: Simon Weller [mailto:sweller@ena.com] 
Sent: Monday, April 10, 2017 5:26 PM
To: dev@cloudstack.apache.org
Subject: Re: How are router checks scheduled?

We've seen something very similar. By any chance, are you seeing any strange cpu load issues that grow over time?

Our team has been chasing down an issue that appears to be related to s2s vpn checks, where a race condition seems to occur that threads out the cpu over time.



________________________________
From: Sean Lair <sl...@ippathways.com>
Sent: Monday, April 10, 2017 5:11 PM
To: dev@cloudstack.apache.org
Subject: RE: How are router checks scheduled?

I do have two mgmt servers, but I have one powered off.  The log excerpt is from one management server.  This can be checked in the environment by running:

cat /var/log/cloudstack/management/management-server.log | grep "routers to update status"

This is happening both in prod and our dev environment.  I've been digging through the code and have some ideas and will post back later if successful in correcting the issue.

The biggest problem is the race condition between the two simultaneous S2S VPN checks.  They step on each other and spam the heck out of us with the email alerting.



-----Original Message-----
From: Simon Weller [mailto:sweller@ena.com]
Sent: Monday, April 10, 2017 5:02 PM
To: dev@cloudstack.apache.org
Subject: RE: How are router checks scheduled?

Do you have 2 management servers?

Simon Weller/615-312-6068

-----Original Message-----
From: Sean Lair [slair@ippathways.com]
Received: Monday, 10 Apr 2017, 2:54PM
To: dev@cloudstack.apache.org [dev@cloudstack.apache.org]
Subject: How are router checks scheduled?

According to my management server logs, some of the period checks are getting kicked off twice at the same time.  The CheckRouterTask is kicked off every 30-seconds, but each time it is ran, it is ran twice at the same second...  See logs below for example:

2017-04-10 21:48:12,879 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] (RouterStatusMonitor-1:ctx-5f7bc584) (logid:4d5b1031) Found 10 routers to update status.
2017-04-10 21:48:12,932 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] (RouterStatusMonitor-1:ctx-d027ab6f) (logid:1bc50629) Found 10 routers to update status.
2017-04-10 21:48:42,877 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] (RouterStatusMonitor-1:ctx-2c8f4d18) (logid:e9111785) Found 10 routers to update status.
2017-04-10 21:48:42,927 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] (RouterStatusMonitor-1:ctx-1bfd5351) (logid:ad0f95ef) Found 10 routers to update status.
2017-04-10 21:49:12,874 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] (RouterStatusMonitor-1:ctx-ede0d2bb) (logid:6f244423) Found 10 routers to update status.
2017-04-10 21:49:12,928 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] (RouterStatusMonitor-1:ctx-d58842d5) (logid:8442d73c) Found 10 routers to update status.

How is this scheduled/kicked off?  I am debugging some site-to-site VPN alert problems, and they seem to be related to a race condition due to the "CheckRouterTask" be kicked off two at a time.

Thanks
Sean




Re: How are router checks scheduled?

Posted by Simon Weller <sw...@ena.com>.
We've seen something very similar. By any chance, are you seeing any strange cpu load issues that grow over time?

Our team has been chasing down an issue that appears to be related to s2s vpn checks, where a race condition seems to occur that threads out the cpu over time.



________________________________
From: Sean Lair <sl...@ippathways.com>
Sent: Monday, April 10, 2017 5:11 PM
To: dev@cloudstack.apache.org
Subject: RE: How are router checks scheduled?

I do have two mgmt servers, but I have one powered off.  The log excerpt is from one management server.  This can be checked in the environment by running:

cat /var/log/cloudstack/management/management-server.log | grep "routers to update status"

This is happening both in prod and our dev environment.  I've been digging through the code and have some ideas and will post back later if successful in correcting the issue.

The biggest problem is the race condition between the two simultaneous S2S VPN checks.  They step on each other and spam the heck out of us with the email alerting.



-----Original Message-----
From: Simon Weller [mailto:sweller@ena.com]
Sent: Monday, April 10, 2017 5:02 PM
To: dev@cloudstack.apache.org
Subject: RE: How are router checks scheduled?

Do you have 2 management servers?

Simon Weller/615-312-6068

-----Original Message-----
From: Sean Lair [slair@ippathways.com]
Received: Monday, 10 Apr 2017, 2:54PM
To: dev@cloudstack.apache.org [dev@cloudstack.apache.org]
Subject: How are router checks scheduled?

According to my management server logs, some of the period checks are getting kicked off twice at the same time.  The CheckRouterTask is kicked off every 30-seconds, but each time it is ran, it is ran twice at the same second...  See logs below for example:

2017-04-10 21:48:12,879 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] (RouterStatusMonitor-1:ctx-5f7bc584) (logid:4d5b1031) Found 10 routers to update status.
2017-04-10 21:48:12,932 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] (RouterStatusMonitor-1:ctx-d027ab6f) (logid:1bc50629) Found 10 routers to update status.
2017-04-10 21:48:42,877 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] (RouterStatusMonitor-1:ctx-2c8f4d18) (logid:e9111785) Found 10 routers to update status.
2017-04-10 21:48:42,927 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] (RouterStatusMonitor-1:ctx-1bfd5351) (logid:ad0f95ef) Found 10 routers to update status.
2017-04-10 21:49:12,874 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] (RouterStatusMonitor-1:ctx-ede0d2bb) (logid:6f244423) Found 10 routers to update status.
2017-04-10 21:49:12,928 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] (RouterStatusMonitor-1:ctx-d58842d5) (logid:8442d73c) Found 10 routers to update status.

How is this scheduled/kicked off?  I am debugging some site-to-site VPN alert problems, and they seem to be related to a race condition due to the "CheckRouterTask" be kicked off two at a time.

Thanks
Sean




RE: How are router checks scheduled?

Posted by Sean Lair <sl...@ippathways.com>.
I do have two mgmt servers, but I have one powered off.  The log excerpt is from one management server.  This can be checked in the environment by running:

cat /var/log/cloudstack/management/management-server.log | grep "routers to update status"

This is happening both in prod and our dev environment.  I've been digging through the code and have some ideas and will post back later if successful in correcting the issue.  

The biggest problem is the race condition between the two simultaneous S2S VPN checks.  They step on each other and spam the heck out of us with the email alerting.



-----Original Message-----
From: Simon Weller [mailto:sweller@ena.com] 
Sent: Monday, April 10, 2017 5:02 PM
To: dev@cloudstack.apache.org
Subject: RE: How are router checks scheduled?

Do you have 2 management servers?

Simon Weller/615-312-6068

-----Original Message-----
From: Sean Lair [slair@ippathways.com]
Received: Monday, 10 Apr 2017, 2:54PM
To: dev@cloudstack.apache.org [dev@cloudstack.apache.org]
Subject: How are router checks scheduled?

According to my management server logs, some of the period checks are getting kicked off twice at the same time.  The CheckRouterTask is kicked off every 30-seconds, but each time it is ran, it is ran twice at the same second...  See logs below for example:

2017-04-10 21:48:12,879 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] (RouterStatusMonitor-1:ctx-5f7bc584) (logid:4d5b1031) Found 10 routers to update status.
2017-04-10 21:48:12,932 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] (RouterStatusMonitor-1:ctx-d027ab6f) (logid:1bc50629) Found 10 routers to update status.
2017-04-10 21:48:42,877 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] (RouterStatusMonitor-1:ctx-2c8f4d18) (logid:e9111785) Found 10 routers to update status.
2017-04-10 21:48:42,927 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] (RouterStatusMonitor-1:ctx-1bfd5351) (logid:ad0f95ef) Found 10 routers to update status.
2017-04-10 21:49:12,874 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] (RouterStatusMonitor-1:ctx-ede0d2bb) (logid:6f244423) Found 10 routers to update status.
2017-04-10 21:49:12,928 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] (RouterStatusMonitor-1:ctx-d58842d5) (logid:8442d73c) Found 10 routers to update status.

How is this scheduled/kicked off?  I am debugging some site-to-site VPN alert problems, and they seem to be related to a race condition due to the "CheckRouterTask" be kicked off two at a time.

Thanks
Sean




RE: How are router checks scheduled?

Posted by Simon Weller <sw...@ena.com>.
Do you have 2 management servers?

Simon Weller/615-312-6068

-----Original Message-----
From: Sean Lair [slair@ippathways.com]
Received: Monday, 10 Apr 2017, 2:54PM
To: dev@cloudstack.apache.org [dev@cloudstack.apache.org]
Subject: How are router checks scheduled?

According to my management server logs, some of the period checks are getting kicked off twice at the same time.  The CheckRouterTask is kicked off every 30-seconds, but each time it is ran, it is ran twice at the same second...  See logs below for example:

2017-04-10 21:48:12,879 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] (RouterStatusMonitor-1:ctx-5f7bc584) (logid:4d5b1031) Found 10 routers to update status.
2017-04-10 21:48:12,932 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] (RouterStatusMonitor-1:ctx-d027ab6f) (logid:1bc50629) Found 10 routers to update status.
2017-04-10 21:48:42,877 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] (RouterStatusMonitor-1:ctx-2c8f4d18) (logid:e9111785) Found 10 routers to update status.
2017-04-10 21:48:42,927 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] (RouterStatusMonitor-1:ctx-1bfd5351) (logid:ad0f95ef) Found 10 routers to update status.
2017-04-10 21:49:12,874 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] (RouterStatusMonitor-1:ctx-ede0d2bb) (logid:6f244423) Found 10 routers to update status.
2017-04-10 21:49:12,928 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] (RouterStatusMonitor-1:ctx-d58842d5) (logid:8442d73c) Found 10 routers to update status.

How is this scheduled/kicked off?  I am debugging some site-to-site VPN alert problems, and they seem to be related to a race condition due to the "CheckRouterTask" be kicked off two at a time.

Thanks
Sean