You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cloudstack.apache.org by GitBox <gi...@apache.org> on 2020/02/24 16:08:35 UTC

[GitHub] [cloudstack] ggoodrich-ipp opened a new pull request #3915: Incorporate VR OOB start checks to properly HA the VR

ggoodrich-ipp opened a new pull request #3915: Incorporate VR OOB start checks to properly HA the VR
URL: https://github.com/apache/cloudstack/pull/3915
 
 
   ## Description
   <!--- Describe your changes in detail -->
   The file VirtualNetworkApplianceManagerImpl.java is edited for a related VM HA problem. When a Host is determined to be DOWN, CloudStack attempts to VM HA any affected routers. The problem is, when the host is determined to be down, by code referenced above, the host may not actually be DOWN. On KVM for example, the host is considered DOWN if the agent is stopped on the KVM host for too long. In that case, the VMs could still be running just fine... However when we think the host is DOWN, VM HA runs on the router and as part of that it unallocates/cleans-up the router and it's 169.x.x.x control IP is unallocated. Then after it cleans it up, it tries to power on the router on another host, and as part of that it allocates a NEW 169.x.x.x control IP and writes that to the DB. However, since the router isn't actually down (we just think the host is down) the VM HA fails as the vRouter is currently still running on the problem host.
   
   Next, in this example, when the host agent is back online again, it sends a power report to the mgmt servers, and the management servers think the router was powered-on OOB. However, the GUI will not show a control IP for the vRouter, and the DB will have the NEW control IP it tried to allocated during the failed VM HA event. Thus, leaving us unable to communicate with the vRouter.
   
   This PR does a simple check that we can still communicate with the vRouter after any OOB power-on occurs. If we can, then we have the correct control IP in the DB and we're good - so we do nothing. If we can't communicate with the vRouter after the OOB power-on, we do a reboot of the vRouter to fix it.
   <!-- For new features, provide link to FS, dev ML discussion etc. -->
   <!-- In case of bug fix, the expected and actual behaviours, steps to reproduce. -->
   
   <!-- When "Fixes: #<id>" is specified, the issue/PR will automatically be closed when this PR gets merged -->
   <!-- For addressing multiple issues/PRs, use multiple "Fixes: #<id>" -->
   <!-- Fixes: # -->
   
   ## Types of changes
   <!--- What types of changes does your code introduce? Put an `x` in all the boxes that apply: -->
   - [ ] Breaking change (fix or feature that would cause existing functionality to change)
   - [ ] New feature (non-breaking change which adds functionality)
   - [x] Bug fix (non-breaking change which fixes an issue)
   - [x] Enhancement (improves an existing feature and functionality)
   - [ ] Cleanup (Code refactoring and cleanup, that may add test cases)
   
   ## Screenshots (if appropriate):
   
   ## How Has This Been Tested?
   <!-- Please describe in detail how you tested your changes. -->
   <!-- Include details of your testing environment, and the tests you ran to -->
   <!-- see how your change affects other areas of the code, etc. -->
   I ran this sql statement to simulate an OOB power on by making CloudStack believe the router is down, but the host then sending a power report stating it is running:
   
   -- id = 157 is the row id of the virtual router in the table 
   `update vm_instance set state='Stopped',power_state='PowerReportMissing',host_id=NULL where id=157;
   `
   
   I then observed that the router got marked as OOB started, and was considered healthy, and no further action was taken.
   
   I then ran the sql statement above again, to make cloudstack believe the router is down, and then connected to the router via cloudstack-ssh and took it to run level 1 via 'init 1' to effectively make it so that it cannot be connected to.
   
   I then observed that the router was restarted by cloudstack, and verified the logs on the management server
   
   <!-- Please read the [CONTRIBUTING](https://github.com/apache/cloudstack/blob/master/CONTRIBUTING.md) document -->
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [cloudstack] ggoodrich-ipp commented on issue #3915: Incorporate VR OOB start checks to properly HA the VR

Posted by GitBox <gi...@apache.org>.
ggoodrich-ipp commented on issue #3915: Incorporate VR OOB start checks to properly HA the VR
URL: https://github.com/apache/cloudstack/pull/3915#issuecomment-599737810
 
 
   Any interest in reviewing this?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [cloudstack] DaanHoogland commented on issue #3915: Incorporate VR OOB start checks to properly HA the VR

Posted by GitBox <gi...@apache.org>.
DaanHoogland commented on issue #3915: Incorporate VR OOB start checks to properly HA the VR
URL: https://github.com/apache/cloudstack/pull/3915#issuecomment-599932799
 
 
   If this is of interest to you for having it in, yes. at the moment we are in a freeze and hopefully only one or two weeks.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services