You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cloudstack.apache.org by Tim Gipson <tg...@ena.com.INVALID> on 2017/09/12 19:12:48 UTC

Question concerning Virtual Routers and problems during failover

Hey all,

I’ve found what I think could be a possible issue with the redundant VPC router pairs in Clousdstack.  The issue was first noticed when routers were failing over from master to backup.  When the backup router became master, everything continued to work properly and traffic flowed as normal.  However, when it failed from the new master back to the original master the virtual router stopped allowing traffic through any network interfaces and any failover after that resulted in virtual routers that were not passing traffic.

I can reproduce this behavior by doing a manual failover (logging in and issuing a reboot command on the router) from master to backup and then back to the original master.  From what I can tell, the iptables rules on the router are somehow modified during the failover (or a manual reboot) in such a way as to make them completely nonfunctional.  I did a side-by-side comparison of the iptables rules before and after a failover (or a manual reboot) and there are definite differences.  Sometimes rules are changed, sometimes they are duplicated, and I’ve even found that some rules are missing completely out of iptables.

We are running in a CentOS 7 environment and using KVM as our hypervisor.  Our CS version is 4.8 with standard images for the VRs.  As mentioned previously, our VRs are in redundant pairs for VPCs.

I’ve attached two iptables outputs, one from a working router and one from a broken router after failover.

Any help or direction you could provide to help me further identify why this is happening would be appreciated.

Thanks!

Tim Gipson
<https://www.ena.com/>

 


Re: Question concerning Virtual Routers and problems during failover

Posted by Tim Gipson <tg...@ena.com.INVALID>.
I just opened a JIRA issue  https://issues.apache.org/jira/browse/CLOUDSTACK-10074 and added my IPtables files as well as the management logs and the logs from the routers.  I started the manual failover at around 14:40 so that should help anyone wanting to look at the logs.

Thanks!

Tim Gipson
Systems Engineer
Direct: 615-312-6157
Mobile: 615-585-3652

 <https://www.ena.com/>
 

On 9/13/17, 10:18 AM, "Tim Gipson" <tg...@ena.com.INVALID> wrote:

    Sure, I’ll need to recreate a failure scenario so I can capture all that data for you.  I’ll post it here as soon as I’ve got it.
    
    Thanks!
    
    Tim Gipson
    Systems Engineer
    Direct: 615-312-6157
    Mobile: 615-585-3652
    
     <https://www.ena.com/>
     
    
    On 9/12/17, 10:53 PM, "Nitin Kumar Maharana" <ni...@accelerite.com> wrote:
    
        Hi Tim,
        
        Can you please attach both VR’s cloud.log(present in VR path /var/log/cloud.log) as well as management server log of the failure case.
        Which will help us finding out the exact cause of the failure.
        
        
        Thanks,
        Nitin
        On 13-Sep-2017, at 12:42 AM, Tim Gipson <tg...@ena.com.invalid>> wrote:
        
        Hey all,
        
        I’ve found what I think could be a possible issue with the redundant VPC router pairs in Clousdstack.  The issue was first noticed when routers were failing over from master to backup.  When the backup router became master, everything continued to work properly and traffic flowed as normal.  However, when it failed from the new master back to the original master the virtual router stopped allowing traffic through any network interfaces and any failover after that resulted in virtual routers that were not passing traffic.
        
        I can reproduce this behavior by doing a manual failover (logging in and issuing a reboot command on the router) from master to backup and then back to the original master.  From what I can tell, the iptables rules on the router are somehow modified during the failover (or a manual reboot) in such a way as to make them completely nonfunctional.  I did a side-by-side comparison of the iptables rules before and after a failover (or a manual reboot) and there are definite differences.  Sometimes rules are changed, sometimes they are duplicated, and I’ve even found that some rules are missing completely out of iptables.
        
        We are running in a CentOS 7 environment and using KVM as our hypervisor.  Our CS version is 4.8 with standard images for the VRs.  As mentioned previously, our VRs are in redundant pairs for VPCs.
        
        I’ve attached two iptables outputs, one from a working router and one from a broken router after failover.
        
        Any help or direction you could provide to help me further identify why this is happening would be appreciated.
        
        Thanks!
        
        Tim Gipson
        <https://www.ena.com/>
        
        
        
        <iptables_broken.txt><iptables_working.txt>
        
        DISCLAIMER
        ==========
        This e-mail may contain privileged and confidential information which is the property of Accelerite, a Persistent Systems business. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Accelerite, a Persistent Systems business does not accept any liability for virus infected mails.
        
    
    


Re: Question concerning Virtual Routers and problems during failover

Posted by Tim Gipson <tg...@ena.com.INVALID>.
Sure, I’ll need to recreate a failure scenario so I can capture all that data for you.  I’ll post it here as soon as I’ve got it.

Thanks!

Tim Gipson
Systems Engineer
Direct: 615-312-6157
Mobile: 615-585-3652

 <https://www.ena.com/>
 

On 9/12/17, 10:53 PM, "Nitin Kumar Maharana" <ni...@accelerite.com> wrote:

    Hi Tim,
    
    Can you please attach both VR’s cloud.log(present in VR path /var/log/cloud.log) as well as management server log of the failure case.
    Which will help us finding out the exact cause of the failure.
    
    
    Thanks,
    Nitin
    On 13-Sep-2017, at 12:42 AM, Tim Gipson <tg...@ena.com.invalid>> wrote:
    
    Hey all,
    
    I’ve found what I think could be a possible issue with the redundant VPC router pairs in Clousdstack.  The issue was first noticed when routers were failing over from master to backup.  When the backup router became master, everything continued to work properly and traffic flowed as normal.  However, when it failed from the new master back to the original master the virtual router stopped allowing traffic through any network interfaces and any failover after that resulted in virtual routers that were not passing traffic.
    
    I can reproduce this behavior by doing a manual failover (logging in and issuing a reboot command on the router) from master to backup and then back to the original master.  From what I can tell, the iptables rules on the router are somehow modified during the failover (or a manual reboot) in such a way as to make them completely nonfunctional.  I did a side-by-side comparison of the iptables rules before and after a failover (or a manual reboot) and there are definite differences.  Sometimes rules are changed, sometimes they are duplicated, and I’ve even found that some rules are missing completely out of iptables.
    
    We are running in a CentOS 7 environment and using KVM as our hypervisor.  Our CS version is 4.8 with standard images for the VRs.  As mentioned previously, our VRs are in redundant pairs for VPCs.
    
    I’ve attached two iptables outputs, one from a working router and one from a broken router after failover.
    
    Any help or direction you could provide to help me further identify why this is happening would be appreciated.
    
    Thanks!
    
    Tim Gipson
    <https://www.ena.com/>
    
    
    
    <iptables_broken.txt><iptables_working.txt>
    
    DISCLAIMER
    ==========
    This e-mail may contain privileged and confidential information which is the property of Accelerite, a Persistent Systems business. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Accelerite, a Persistent Systems business does not accept any liability for virus infected mails.
    


Re: Question concerning Virtual Routers and problems during failover

Posted by Nitin Kumar Maharana <ni...@accelerite.com>.
Hi Tim,

Can you please attach both VR’s cloud.log(present in VR path /var/log/cloud.log) as well as management server log of the failure case.
Which will help us finding out the exact cause of the failure.


Thanks,
Nitin
On 13-Sep-2017, at 12:42 AM, Tim Gipson <tg...@ena.com.invalid>> wrote:

Hey all,

I’ve found what I think could be a possible issue with the redundant VPC router pairs in Clousdstack.  The issue was first noticed when routers were failing over from master to backup.  When the backup router became master, everything continued to work properly and traffic flowed as normal.  However, when it failed from the new master back to the original master the virtual router stopped allowing traffic through any network interfaces and any failover after that resulted in virtual routers that were not passing traffic.

I can reproduce this behavior by doing a manual failover (logging in and issuing a reboot command on the router) from master to backup and then back to the original master.  From what I can tell, the iptables rules on the router are somehow modified during the failover (or a manual reboot) in such a way as to make them completely nonfunctional.  I did a side-by-side comparison of the iptables rules before and after a failover (or a manual reboot) and there are definite differences.  Sometimes rules are changed, sometimes they are duplicated, and I’ve even found that some rules are missing completely out of iptables.

We are running in a CentOS 7 environment and using KVM as our hypervisor.  Our CS version is 4.8 with standard images for the VRs.  As mentioned previously, our VRs are in redundant pairs for VPCs.

I’ve attached two iptables outputs, one from a working router and one from a broken router after failover.

Any help or direction you could provide to help me further identify why this is happening would be appreciated.

Thanks!

Tim Gipson
<https://www.ena.com/>



<iptables_broken.txt><iptables_working.txt>

DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Accelerite, a Persistent Systems business. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Accelerite, a Persistent Systems business does not accept any liability for virus infected mails.