You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cloudstack.apache.org by ustcweizhou <gi...@git.apache.org> on 2015/12/09 09:47:37 UTC

[GitHub] cloudstack pull request: CLOUDSTACK-9114: restartnetwork with clea...

GitHub user ustcweizhou opened a pull request:

    https://github.com/apache/cloudstack/pull/1198

    CLOUDSTACK-9114: restartnetwork with cleanup should restart the RVRs one by one

    
    Testing results:
    
    (1) before change:
    job started at 2015-12-09 09:26:07,570, ended at 2015-12-09 09:28:25,524
    38 packets lost (333 packets transmitted, 295 received, 11% packet loss, time 332284ms)
    
    (2) after change:
    job started at 2015-12-09 09:32:00,543, ended at 2015-12-09 09:34:26,535
    1 packet lost (623 packets transmitted, 622 received, 0% packet loss, time 622598ms)

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ustcweizhou/cloudstack restart-network

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/cloudstack/pull/1198.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1198
    
----
commit 9f513d48614cb1983565bea29b98b8aa56aee3ac
Author: Wei Zhou <w....@tech.leaseweb.com>
Date:   2015-12-09T08:43:18Z

    CLOUDSTACK-9114: restartnetwork with cleanup should restart the RVRs one by one
    
    Testing results:
    
    (1) before change:
    job started at 2015-12-09 09:26:07,570, ended at 2015-12-09 09:28:25,524
    38 packets lost (333 packets transmitted, 295 received, 11% packet loss, time 332284ms)
    
    (2) after change:
    job started at 2015-12-09 09:32:00,543, ended at 2015-12-09 09:34:26,535
    1 packet lost (623 packets transmitted, 622 received, 0% packet loss, time 622598ms)

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] cloudstack pull request: CLOUDSTACK-9114: restartnetwork with clea...

Posted by wilderrodrigues <gi...@git.apache.org>.
Github user wilderrodrigues commented on the pull request:

    https://github.com/apache/cloudstack/pull/1198#issuecomment-173196826
  
    Ping @remibergsma @borisroman 
    
    This one needs an integration test to cover the changes. Worth writing one for. Who got time left? :)
    
    Cheers,
    Wilder


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] cloudstack pull request: CLOUDSTACK-9114: restartnetwork with clea...

Posted by ustcweizhou <gi...@git.apache.org>.
Github user ustcweizhou commented on the pull request:

    https://github.com/apache/cloudstack/pull/1198#issuecomment-164199095
  
    @DaanHoogland @wilderrodrigues 
    assume there are two routers now routerA (master) and routerB (backup).
    if  we destroy routerA (master) at first, then routerB will become master without downtime. and another routerC will create as BACKUP. However, we need to destroy routerB as well, then routerC will become MASTER and another new routerD will become BACKUP.
    
    in this processing, we also need to wait some seconds for keepalived/conntrackd on routerC to be up , because we should make sure that the services on BACKUP are running before destroying MASTER.
    
    hence, I think the solution in PR is good (there is only one BACKUP->MASTER state change).
    
    for the sleep or not, we need to test it. I added it because the test failedout in my testing (without sleep).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] cloudstack pull request: CLOUDSTACK-9114: restartnetwork with clea...

Posted by wilderrodrigues <gi...@git.apache.org>.
Github user wilderrodrigues commented on a diff in the pull request:

    https://github.com/apache/cloudstack/pull/1198#discussion_r47328907
  
    --- Diff: engine/orchestration/src/org/apache/cloudstack/engine/orchestration/NetworkOrchestrator.java ---
    @@ -2558,6 +2575,62 @@ public boolean restartNetwork(Long networkId, Account callerAccount, User caller
             }
         }
     
    +    /* If there are redundant routers in the isolated network, we follow the steps to make the network working better
    +     *  (1) destroy backup router if exists
    +     *  (2) create new backup router
    +     *  (3) destroy master router (then the backup will become master)
    +     *  (4) create a new router as backup router.
    +     */
    +    private boolean restartGuestNetworkWithRedundantRouters(NetworkVO network, List<DomainRouterVO> routers, ReservationContext context) throws ResourceUnavailableException, ConcurrentOperationException, InsufficientCapacityException {
    +        Account caller = CallContext.current().getCallingAccount();
    +        long callerUserId = CallContext.current().getCallingUserId();
    +
    +        // check the master and backup redundant state
    +        DomainRouterVO masterRouter = null;
    +        DomainRouterVO backupRouter = null;
    +        if (routers != null && routers.size() == 1) {
    +            masterRouter = routers.get(0);
    +        } if (routers != null && routers.size() == 2) {
    +            DomainRouterVO router1 = routers.get(0);
    +            DomainRouterVO router2 = routers.get(1);
    +            if (router1.getRedundantState() == RedundantState.MASTER || router2.getRedundantState() == RedundantState.BACKUP) {
    +                masterRouter = router1;
    +                backupRouter = router2;
    +            } else if (router1.getRedundantState() == RedundantState.BACKUP || router2.getRedundantState() == RedundantState.MASTER) {
    +                masterRouter = router2;
    +                backupRouter = router1;
    +            } else { // both routers are in UNKNOWN state
    +                masterRouter = router1;
    +                backupRouter = router2;
    +            }
    +        }
    +
    +        NetworkOfferingVO offering = _networkOfferingDao.findByIdIncludingRemoved(network.getNetworkOfferingId());
    +        DeployDestination dest = new DeployDestination(_dcDao.findById(network.getDataCenterId()), null, null, null);
    +        List<Provider> providersToImplement = getNetworkProviders(network.getId());
    +
    +        // destroy backup router
    +        if (backupRouter != null) {
    +            _routerService.destroyRouter(backupRouter.getId(), caller, callerUserId);
    +        }
    +        // create new backup router
    +        implementNetworkElements(dest, context, network, offering, providersToImplement);
    +
    +        // destroy master router
    +        if (masterRouter != null) {
    +            try {
    +                Thread.sleep(10000); // wait 10s for the keepalived/conntrackd on backup router
    --- End diff --
    
    Hi @ustcweizhou 
    
    Guess what... we are all busy! :)
    
    I will give a Look Okay to Me to your changes, get those in and improve it, but for this last time only.
    
    If we keep saying we are busy to test our changes, we won't improve the software we rely to run our customers' platforms. Next change without marvin tests will get a -1 from me.
    
    Cheers,
    Wilder


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] cloudstack pull request: CLOUDSTACK-9114: restartnetwork with clea...

Posted by wilderrodrigues <gi...@git.apache.org>.
Github user wilderrodrigues commented on a diff in the pull request:

    https://github.com/apache/cloudstack/pull/1198#discussion_r47204055
  
    --- Diff: engine/orchestration/src/org/apache/cloudstack/engine/orchestration/NetworkOrchestrator.java ---
    @@ -2558,6 +2575,62 @@ public boolean restartNetwork(Long networkId, Account callerAccount, User caller
             }
         }
     
    +    /* If there are redundant routers in the isolated network, we follow the steps to make the network working better
    +     *  (1) destroy backup router if exists
    +     *  (2) create new backup router
    +     *  (3) destroy master router (then the backup will become master)
    +     *  (4) create a new router as backup router.
    +     */
    +    private boolean restartGuestNetworkWithRedundantRouters(NetworkVO network, List<DomainRouterVO> routers, ReservationContext context) throws ResourceUnavailableException, ConcurrentOperationException, InsufficientCapacityException {
    +        Account caller = CallContext.current().getCallingAccount();
    +        long callerUserId = CallContext.current().getCallingUserId();
    +
    +        // check the master and backup redundant state
    +        DomainRouterVO masterRouter = null;
    +        DomainRouterVO backupRouter = null;
    +        if (routers != null && routers.size() == 1) {
    +            masterRouter = routers.get(0);
    +        } if (routers != null && routers.size() == 2) {
    +            DomainRouterVO router1 = routers.get(0);
    +            DomainRouterVO router2 = routers.get(1);
    +            if (router1.getRedundantState() == RedundantState.MASTER || router2.getRedundantState() == RedundantState.BACKUP) {
    +                masterRouter = router1;
    +                backupRouter = router2;
    +            } else if (router1.getRedundantState() == RedundantState.BACKUP || router2.getRedundantState() == RedundantState.MASTER) {
    +                masterRouter = router2;
    +                backupRouter = router1;
    +            } else { // both routers are in UNKNOWN state
    +                masterRouter = router1;
    +                backupRouter = router2;
    +            }
    +        }
    +
    +        NetworkOfferingVO offering = _networkOfferingDao.findByIdIncludingRemoved(network.getNetworkOfferingId());
    +        DeployDestination dest = new DeployDestination(_dcDao.findById(network.getDataCenterId()), null, null, null);
    +        List<Provider> providersToImplement = getNetworkProviders(network.getId());
    +
    +        // destroy backup router
    +        if (backupRouter != null) {
    +            _routerService.destroyRouter(backupRouter.getId(), caller, callerUserId);
    +        }
    +        // create new backup router
    +        implementNetworkElements(dest, context, network, offering, providersToImplement);
    +
    +        // destroy master router
    +        if (masterRouter != null) {
    +            try {
    +                Thread.sleep(10000); // wait 10s for the keepalived/conntrackd on backup router
    --- End diff --
    
    It's a blocking call, you don't need it. Also, the keepalived is started with the kernel module. It would be way easier to prove the behaviour with an automated test.
    
    If you don't feel confident with writing the test, I can do it after christmas/new year and change the java code as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] cloudstack pull request: CLOUDSTACK-9114: restartnetwork with clea...

Posted by ustcweizhou <gi...@git.apache.org>.
Github user ustcweizhou commented on the pull request:

    https://github.com/apache/cloudstack/pull/1198#issuecomment-219629360
  
    closed as I have more commits for restart network, restartvpc, update network, upgrade network (from old version 4.5 or before).
    the downtime of network is reduced to 1-5 seconds.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] cloudstack pull request: CLOUDSTACK-9114: restartnetwork with clea...

Posted by ustcweizhou <gi...@git.apache.org>.
Github user ustcweizhou commented on the pull request:

    https://github.com/apache/cloudstack/pull/1198#issuecomment-163554256
  
    @wilderrodrigues 
    
    I used VirtualNetworkApplianceService at first, and met an issue like:
    "spring autowiring with unique beans: Spring expected single matching bean but found 2"
    this is because some functions are implemented in both of VirtualNetworkApplianceManagerImpl and VpcVirtualNetworkApplianceManagerImpl .
    
    I changed it back to VirtualNetworkApplianceService  just now, the compilation succeed, the issue  does not exist any more, weird. 
    
    Anyway, I think it is not important. that is the same to definition in api/src/org/apache/cloudstack/api/BaseCmd.java:    
    public VpcVirtualNetworkApplianceService _routerService;



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] cloudstack pull request: CLOUDSTACK-9114: restartnetwork with clea...

Posted by wilderrodrigues <gi...@git.apache.org>.
Github user wilderrodrigues commented on the pull request:

    https://github.com/apache/cloudstack/pull/1198#issuecomment-163867468
  
    @remibergsma @DaanHoogland 
    
    This PR Looks Okay To Me, no good, but okay. Please merge it and I will improve the code and add Marvin tests to cover it because @ustcweizhou said he is too busy to do that.
    
    I'm also busy, we are all busy. It should not be used as excuse to avoid writing tests. Even our managers know that.
    
    I went into berserker mode and told him that next PR without tests will get a -1 from me, and I will hold to that promise.
    
    Cheers,
    Wilder  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] cloudstack pull request: CLOUDSTACK-9114: restartnetwork with clea...

Posted by remibergsma <gi...@git.apache.org>.
Github user remibergsma commented on the pull request:

    https://github.com/apache/cloudstack/pull/1198#issuecomment-163864347
  
    FYI, run integration tests:
    
    ```
    nosetests --with-marvin --marvin-config=${marvinCfg} -s -a tags=advanced,required_hardware=true \
    component/test_vpc_redundant.py \
    component/test_routers_iptables_default_policy.py \
    component/test_routers_network_ops.py \
    component/test_vpc_router_nics.py \
    smoke/test_loadbalance.py \
    smoke/test_internal_lb.py \
    smoke/test_ssvm.py \
    smoke/test_network.py
    
    ```
    
    Result:
    
    ```
    Create a redundant VPC with two networks with two VMs in each network ... === TestName: test_01_create_redundant_VPC_2tiers_4VMs_4IPs_4PF_ACL | Status : SUCCESS ===
    ok
    Create a redundant VPC with two networks with two VMs in each network and check default routes ... === TestName: test_02_redundant_VPC_default_routes | Status : SUCCESS ===
    ok
    Create a redundant VPC with two networks with two VMs in each network ... === TestName: test_03_create_redundant_VPC_1tier_2VMs_2IPs_2PF_ACL_reboot_routers | Status : SUCCESS ===
    ok
    Test iptables default INPUT/FORWARD policy on RouterVM ... === TestName: test_02_routervm_iptables_policies | Status : SUCCESS ===
    ok
    Test iptables default INPUT/FORWARD policies on VPC router ... === TestName: test_01_single_VPC_iptables_policies | Status : SUCCESS ===
    ok
    Test redundant router internals ... === TestName: test_01_isolate_network_FW_PF_default_routes_egress_true | Status : SUCCESS ===
    ok
    Test redundant router internals ... === TestName: test_02_isolate_network_FW_PF_default_routes_egress_false | Status : SUCCESS ===
    ok
    Test redundant router internals ... === TestName: test_01_RVR_Network_FW_PF_SSH_default_routes_egress_true | Status : SUCCESS ===
    ok
    Test redundant router internals ... === TestName: test_02_RVR_Network_FW_PF_SSH_default_routes_egress_false | Status : SUCCESS ===
    ok
    Test redundant router internals ... === TestName: test_03_RVR_Network_check_router_state | Status : SUCCESS ===
    ok
    Create a VPC with two networks with one VM in each network and test nics after destroy ... === TestName: test_01_VPC_nics_after_destroy | Status : SUCCESS ===
    ok
    Create a VPC with two networks with one VM in each network and test default routes ... === TestName: test_02_VPC_default_routes | Status : SUCCESS ===
    ok
    Check the password file in the Router VM ... === TestName: test_isolate_network_password_server | Status : SUCCESS ===
    ok
    Check that the /etc/dhcphosts.txt doesn't contain duplicate IPs ... === TestName: test_router_dhcphosts | Status : SUCCESS ===
    ok
    Test to create Load balancing rule with source NAT ... === TestName: test_01_create_lb_rule_src_nat | Status : SUCCESS ===
    ok
    Test to create Load balancing rule with non source NAT ... === TestName: test_02_create_lb_rule_non_nat | Status : SUCCESS ===
    ok
    Test for assign & removing load balancing rule ... === TestName: test_assign_and_removal_lb | Status : SUCCESS ===
    ok
    Test to verify access to loadbalancer haproxy admin stats page ... === TestName: test02_internallb_haproxy_stats_on_all_interfaces | Status : SUCCESS ===
    ok
    Test create, assign, remove of an Internal LB with roundrobin http traffic to 3 vm's ... === TestName: test_01_internallb_roundrobin_1VPC_3VM_HTTP_port80 | Status : SUCCESS ===
    ok
    Test SSVM Internals ... === TestName: test_03_ssvm_internals | Status : SUCCESS ===
    ok
    Test CPVM Internals ... === TestName: test_04_cpvm_internals | Status : SUCCESS ===
    ok
    Test stop SSVM ... === TestName: test_05_stop_ssvm | Status : SUCCESS ===
    ok
    Test stop CPVM ... === TestName: test_06_stop_cpvm | Status : SUCCESS ===
    ok
    Test reboot SSVM ... === TestName: test_07_reboot_ssvm | Status : SUCCESS ===
    ok
    Test reboot CPVM ... === TestName: test_08_reboot_cpvm | Status : SUCCESS ===
    ok
    Test destroy SSVM ... === TestName: test_09_destroy_ssvm | Status : SUCCESS ===
    ok
    Test destroy CPVM ... === TestName: test_10_destroy_cpvm | Status : SUCCESS ===
    ok
    Test Remote Access VPN in VPC ... === TestName: test_vpc_remote_access_vpn | Status : SUCCESS ===
    ok
    Test VPN in VPC ... === TestName: test_vpc_site2site_vpn | Status : SUCCESS ===
    ok
    test_01_vpc_privategw_acl (integration.smoke.test_privategw_acl.TestPrivateGwACL) ... === TestName: test_01_vpc_privategw_acl | Status : SUCCESS ===
    ok
    test_02_vpc_privategw_static_routes (integration.smoke.test_privategw_acl.TestPrivateGwACL) ... === TestName: test_02_vpc_privategw_static_routes | Status : SUCCESS ===
    ok
    test_03_rvpc_privategw_static_routes (integration.smoke.test_privategw_acl.TestPrivateGwACL) ... === TestName: test_03_rvpc_privategw_static_routes | Status : SUCCESS ===
    ok
    Test for port forwarding on source NAT ... === TestName: test_01_port_fwd_on_src_nat | Status : SUCCESS ===
    ok
    Test for port forwarding on non source NAT ... === TestName: test_02_port_fwd_on_non_src_nat | Status : SUCCESS ===
    ok
    Test for reboot router ... === TestName: test_reboot_router | Status : SUCCESS ===
    ok
    Test for Router rules for network rules on acquired public IP ... === TestName: test_network_rules_acquired_public_ip_1_static_nat_rule | Status : SUCCESS ===
    ok
    Test for Router rules for network rules on acquired public IP ... === TestName: test_network_rules_acquired_public_ip_2_nat_rule | Status : SUCCESS ===
    ok
    Test for Router rules for network rules on acquired public IP ... === TestName: test_network_rules_acquired_public_ip_3_Load_Balancer_Rule | Status : SUCCESS ===
    ok
    
    ----------------------------------------------------------------------
    Ran 38 tests in 20142.499s
    
    OK
    ```
    
    
    And:
    
    ```
    nosetests --with-marvin --marvin-config=${marvinCfg} -s -a tags=advanced,required_hardware=false \
    smoke/test_routers.py \
    smoke/test_network_acl.py \
    smoke/test_privategw_acl.py \
    smoke/test_reset_vm_on_reboot.py \
    smoke/test_vm_life_cycle.py \
    smoke/test_vpc_vpn.py \
    smoke/test_service_offerings.py \
    component/test_vpc_offerings.py \
    component/test_vpc_routers.py
    ```
    
    Result:
    
    ```
    Test router internal advanced zone ... === TestName: test_02_router_internal_adv | Status : SUCCESS ===
    ok
    Test restart network ... === TestName: test_03_restart_network_cleanup | Status : SUCCESS ===
    ok
    Test router basic setup ... === TestName: test_05_router_basic | Status : SUCCESS ===
    ok
    Test router advanced setup ... === TestName: test_06_router_advanced | Status : SUCCESS ===
    ok
    Test stop router ... === TestName: test_07_stop_router | Status : SUCCESS ===
    ok
    Test start router ... === TestName: test_08_start_router | Status : SUCCESS ===
    ok
    Test reboot router ... === TestName: test_09_reboot_router | Status : SUCCESS ===
    ok
    Test reset virtual machine on reboot ... === TestName: test_01_reset_vm_on_reboot | Status : SUCCESS ===
    ok
    Test advanced zone virtual router ... === TestName: test_advZoneVirtualRouter | Status : SUCCESS ===
    ok
    Test Deploy Virtual Machine ... === TestName: test_deploy_vm | Status : SUCCESS ===
    ok
    Test Multiple Deploy Virtual Machine ... === TestName: test_deploy_vm_multiple | Status : SUCCESS ===
    ok
    Test Stop Virtual Machine ... === TestName: test_01_stop_vm | Status : SUCCESS ===
    ok
    Test Start Virtual Machine ... === TestName: test_02_start_vm | Status : SUCCESS ===
    ok
    Test Reboot Virtual Machine ... === TestName: test_03_reboot_vm | Status : SUCCESS ===
    ok
    Test destroy Virtual Machine ... === TestName: test_06_destroy_vm | Status : SUCCESS ===
    ok
    Test recover Virtual Machine ... === TestName: test_07_restore_vm | Status : SUCCESS ===
    ok
    Test migrate VM ... === TestName: test_08_migrate_vm | Status : SUCCESS ===
    ok
    Test destroy(expunge) Virtual Machine ... === TestName: test_09_expunge_vm | Status : SUCCESS ===
    ok
    Test to create service offering ... === TestName: test_01_create_service_offering | Status : SUCCESS ===
    ok
    Test to update existing service offering ... === TestName: test_02_edit_service_offering | Status : SUCCESS ===
    ok
    Test to delete service offering ... === TestName: test_03_delete_service_offering | Status : SUCCESS ===
    ok
    Test for delete account ... === TestName: test_delete_account | Status : SUCCESS ===
    ok
    Test for Associate/Disassociate public IP address for admin account ... === TestName: test_public_ip_admin_account | Status : SUCCESS ===
    ok
    Test for Associate/Disassociate public IP address for user account ... === TestName: test_public_ip_user_account | Status : SUCCESS ===
    ok
    Test for release public IP address ... === TestName: test_releaseIP | Status : SUCCESS ===
    ok
    Test create VPC offering ... === TestName: test_01_create_vpc_offering | Status : SUCCESS ===
    ok
    Test VPC offering without load balancing service ... === TestName: test_03_vpc_off_without_lb | Status : SUCCESS ===
    ok
    Test VPC offering without static NAT service ... === TestName: test_04_vpc_off_without_static_nat | Status : SUCCESS ===
    ok
    Test VPC offering without port forwarding service ... === TestName: test_05_vpc_off_without_pf | Status : SUCCESS ===
    ok
    Test VPC offering with invalid services ... === TestName: test_06_vpc_off_invalid_services | Status : SUCCESS ===
    ok
    Test update VPC offering ... === TestName: test_07_update_vpc_off | Status : SUCCESS ===
    ok
    Test list VPC offering ... === TestName: test_08_list_vpc_off | Status : SUCCESS ===
    ok
    test_09_create_redundant_vpc_offering (integration.component.test_vpc_offerings.TestVPCOffering) ... === TestName: test_09_create_redundant_vpc_offering | Status : SUCCESS ===
    ok
    Test start/stop of router after addition of one guest network ... === TestName: test_01_start_stop_router_after_addition_of_one_guest_network | Status : SUCCESS ===
    ok
    Test reboot of router after addition of one guest network ... === TestName: test_02_reboot_router_after_addition_of_one_guest_network | Status : SUCCESS ===
    ok
    Test to change service offering of router after addition of one guest network ... === TestName: test_04_chg_srv_off_router_after_addition_of_one_guest_network | Status : SUCCESS ===
    ok
    Test destroy of router after addition of one guest network ... === TestName: test_05_destroy_router_after_addition_of_one_guest_network | Status : SUCCESS ===
    ok
    Test to stop and start router after creation of VPC ... === TestName: test_01_stop_start_router_after_creating_vpc | Status : SUCCESS ===
    ok
    Test to reboot the router after creating a VPC ... === TestName: test_02_reboot_router_after_creating_vpc | Status : SUCCESS ===
    ok
    Tests to change service offering of the Router after ... === TestName: test_04_change_service_offerring_vpc | Status : SUCCESS ===
    ok
    Test to destroy the router after creating a VPC ... === TestName: test_05_destroy_router_after_creating_vpc | Status : SUCCESS ===
    ok
    
    ----------------------------------------------------------------------
    Ran 41 tests in 8575.467s
    
    OK
    ```
    
    Sounds like a great feature @ustcweizhou @wilderrodrigues and I think we need to do this for both RVR and RVPC. When you guys are ready I can run the tests again. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] cloudstack pull request: CLOUDSTACK-9114: restartnetwork with clea...

Posted by wilderrodrigues <gi...@git.apache.org>.
Github user wilderrodrigues commented on the pull request:

    https://github.com/apache/cloudstack/pull/1198#issuecomment-163557163
  
    I know this issue. You can get it working with @Autowire and @Quialifier annotations. If BaseCmd does the same, then it should be fine.
    
    Cheers,
    Wilder


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] cloudstack pull request: CLOUDSTACK-9114: restartnetwork with clea...

Posted by ustcweizhou <gi...@git.apache.org>.
Github user ustcweizhou commented on a diff in the pull request:

    https://github.com/apache/cloudstack/pull/1198#discussion_r47202713
  
    --- Diff: engine/orchestration/src/org/apache/cloudstack/engine/orchestration/NetworkOrchestrator.java ---
    @@ -2558,6 +2575,62 @@ public boolean restartNetwork(Long networkId, Account callerAccount, User caller
             }
         }
     
    +    /* If there are redundant routers in the isolated network, we follow the steps to make the network working better
    +     *  (1) destroy backup router if exists
    +     *  (2) create new backup router
    +     *  (3) destroy master router (then the backup will become master)
    +     *  (4) create a new router as backup router.
    +     */
    +    private boolean restartGuestNetworkWithRedundantRouters(NetworkVO network, List<DomainRouterVO> routers, ReservationContext context) throws ResourceUnavailableException, ConcurrentOperationException, InsufficientCapacityException {
    +        Account caller = CallContext.current().getCallingAccount();
    +        long callerUserId = CallContext.current().getCallingUserId();
    +
    +        // check the master and backup redundant state
    +        DomainRouterVO masterRouter = null;
    +        DomainRouterVO backupRouter = null;
    +        if (routers != null && routers.size() == 1) {
    +            masterRouter = routers.get(0);
    +        } if (routers != null && routers.size() == 2) {
    +            DomainRouterVO router1 = routers.get(0);
    +            DomainRouterVO router2 = routers.get(1);
    +            if (router1.getRedundantState() == RedundantState.MASTER || router2.getRedundantState() == RedundantState.BACKUP) {
    +                masterRouter = router1;
    +                backupRouter = router2;
    +            } else if (router1.getRedundantState() == RedundantState.BACKUP || router2.getRedundantState() == RedundantState.MASTER) {
    +                masterRouter = router2;
    +                backupRouter = router1;
    +            } else { // both routers are in UNKNOWN state
    +                masterRouter = router1;
    +                backupRouter = router2;
    +            }
    +        }
    +
    +        NetworkOfferingVO offering = _networkOfferingDao.findByIdIncludingRemoved(network.getNetworkOfferingId());
    +        DeployDestination dest = new DeployDestination(_dcDao.findById(network.getDataCenterId()), null, null, null);
    +        List<Provider> providersToImplement = getNetworkProviders(network.getId());
    +
    +        // destroy backup router
    +        if (backupRouter != null) {
    +            _routerService.destroyRouter(backupRouter.getId(), caller, callerUserId);
    +        }
    +        // create new backup router
    +        implementNetworkElements(dest, context, network, offering, providersToImplement);
    +
    +        // destroy master router
    +        if (masterRouter != null) {
    +            try {
    +                Thread.sleep(10000); // wait 10s for the keepalived/conntrackd on backup router
    --- End diff --
    
    if we can make sure that the keepalived/conntrackd are running when the VR becomes Running.
    I think the sleep is needed, no matter we destroy/create backup or master router at first, because there is always a state change from BACKUP to MASTER.
    
    in your method, the sleep is not needed at the first step, but it does at the second step.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] cloudstack pull request: CLOUDSTACK-9114: restartnetwork with clea...

Posted by rhtyd <gi...@git.apache.org>.
Github user rhtyd commented on the pull request:

    https://github.com/apache/cloudstack/pull/1198#issuecomment-216207184
  
    @ustcweizhou thanks, can you rebase against latest master and share state of your PR
    
    this looks like an interesting change we should have
    
    tag:needlove


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] cloudstack pull request: CLOUDSTACK-9114: restartnetwork with clea...

Posted by ustcweizhou <gi...@git.apache.org>.
Github user ustcweizhou commented on a diff in the pull request:

    https://github.com/apache/cloudstack/pull/1198#discussion_r47204914
  
    --- Diff: engine/orchestration/src/org/apache/cloudstack/engine/orchestration/NetworkOrchestrator.java ---
    @@ -2558,6 +2575,62 @@ public boolean restartNetwork(Long networkId, Account callerAccount, User caller
             }
         }
     
    +    /* If there are redundant routers in the isolated network, we follow the steps to make the network working better
    +     *  (1) destroy backup router if exists
    +     *  (2) create new backup router
    +     *  (3) destroy master router (then the backup will become master)
    +     *  (4) create a new router as backup router.
    +     */
    +    private boolean restartGuestNetworkWithRedundantRouters(NetworkVO network, List<DomainRouterVO> routers, ReservationContext context) throws ResourceUnavailableException, ConcurrentOperationException, InsufficientCapacityException {
    +        Account caller = CallContext.current().getCallingAccount();
    +        long callerUserId = CallContext.current().getCallingUserId();
    +
    +        // check the master and backup redundant state
    +        DomainRouterVO masterRouter = null;
    +        DomainRouterVO backupRouter = null;
    +        if (routers != null && routers.size() == 1) {
    +            masterRouter = routers.get(0);
    +        } if (routers != null && routers.size() == 2) {
    +            DomainRouterVO router1 = routers.get(0);
    +            DomainRouterVO router2 = routers.get(1);
    +            if (router1.getRedundantState() == RedundantState.MASTER || router2.getRedundantState() == RedundantState.BACKUP) {
    +                masterRouter = router1;
    +                backupRouter = router2;
    +            } else if (router1.getRedundantState() == RedundantState.BACKUP || router2.getRedundantState() == RedundantState.MASTER) {
    +                masterRouter = router2;
    +                backupRouter = router1;
    +            } else { // both routers are in UNKNOWN state
    +                masterRouter = router1;
    +                backupRouter = router2;
    +            }
    +        }
    +
    +        NetworkOfferingVO offering = _networkOfferingDao.findByIdIncludingRemoved(network.getNetworkOfferingId());
    +        DeployDestination dest = new DeployDestination(_dcDao.findById(network.getDataCenterId()), null, null, null);
    +        List<Provider> providersToImplement = getNetworkProviders(network.getId());
    +
    +        // destroy backup router
    +        if (backupRouter != null) {
    +            _routerService.destroyRouter(backupRouter.getId(), caller, callerUserId);
    +        }
    +        // create new backup router
    +        implementNetworkElements(dest, context, network, offering, providersToImplement);
    +
    +        // destroy master router
    +        if (masterRouter != null) {
    +            try {
    +                Thread.sleep(10000); // wait 10s for the keepalived/conntrackd on backup router
    --- End diff --
    
    @wilderrodrigues 
    I made this change for 4.2.1 at first. I met some issues during the restartnetwork because of the keepalived/conntrackd, hence I added the sleep. 10 seconds is enough to wait the services to be running.
    I did not check the difference between keepalived on Debian 7.0.0 and Debian 7.9.0, so I thought the issue still exist.
    
    I am busy on other issues and have no time on writing the test, automated test and improvement are welcome.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] cloudstack pull request: CLOUDSTACK-9114: restartnetwork with clea...

Posted by wilderrodrigues <gi...@git.apache.org>.
Github user wilderrodrigues commented on the pull request:

    https://github.com/apache/cloudstack/pull/1198#issuecomment-163541774
  
    Hi @ustcweizhou,
    
    The content is a bit misleading because you used the ```VpcVirtualNetworkApplianceService``` instead of ```VirtualNetworkApplianceService```.
    
    Concerning the tests, I know you did that. But if tomorrow (or within a week) we have to improve it, we will have to do the tests, manually, again. So improving the existing integration test will help everybody.
    
    Could you also apply the changes I mentioned against the code, please?
    
    Cheers,
    Wilder


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] cloudstack pull request: CLOUDSTACK-9114: restartnetwork with clea...

Posted by wilderrodrigues <gi...@git.apache.org>.
Github user wilderrodrigues commented on a diff in the pull request:

    https://github.com/apache/cloudstack/pull/1198#discussion_r47201935
  
    --- Diff: engine/orchestration/src/org/apache/cloudstack/engine/orchestration/NetworkOrchestrator.java ---
    @@ -2558,6 +2575,62 @@ public boolean restartNetwork(Long networkId, Account callerAccount, User caller
             }
         }
     
    +    /* If there are redundant routers in the isolated network, we follow the steps to make the network working better
    +     *  (1) destroy backup router if exists
    +     *  (2) create new backup router
    +     *  (3) destroy master router (then the backup will become master)
    +     *  (4) create a new router as backup router.
    +     */
    +    private boolean restartGuestNetworkWithRedundantRouters(NetworkVO network, List<DomainRouterVO> routers, ReservationContext context) throws ResourceUnavailableException, ConcurrentOperationException, InsufficientCapacityException {
    +        Account caller = CallContext.current().getCallingAccount();
    +        long callerUserId = CallContext.current().getCallingUserId();
    +
    +        // check the master and backup redundant state
    +        DomainRouterVO masterRouter = null;
    +        DomainRouterVO backupRouter = null;
    +        if (routers != null && routers.size() == 1) {
    +            masterRouter = routers.get(0);
    +        } if (routers != null && routers.size() == 2) {
    +            DomainRouterVO router1 = routers.get(0);
    +            DomainRouterVO router2 = routers.get(1);
    +            if (router1.getRedundantState() == RedundantState.MASTER || router2.getRedundantState() == RedundantState.BACKUP) {
    +                masterRouter = router1;
    +                backupRouter = router2;
    +            } else if (router1.getRedundantState() == RedundantState.BACKUP || router2.getRedundantState() == RedundantState.MASTER) {
    +                masterRouter = router2;
    +                backupRouter = router1;
    +            } else { // both routers are in UNKNOWN state
    +                masterRouter = router1;
    +                backupRouter = router2;
    +            }
    +        }
    +
    +        NetworkOfferingVO offering = _networkOfferingDao.findByIdIncludingRemoved(network.getNetworkOfferingId());
    +        DeployDestination dest = new DeployDestination(_dcDao.findById(network.getDataCenterId()), null, null, null);
    +        List<Provider> providersToImplement = getNetworkProviders(network.getId());
    +
    +        // destroy backup router
    +        if (backupRouter != null) {
    +            _routerService.destroyRouter(backupRouter.getId(), caller, callerUserId);
    +        }
    +        // create new backup router
    +        implementNetworkElements(dest, context, network, offering, providersToImplement);
    +
    +        // destroy master router
    +        if (masterRouter != null) {
    +            try {
    +                Thread.sleep(10000); // wait 10s for the keepalived/conntrackd on backup router
    --- End diff --
    
    The ```implementNetworkElements()``` is a blocking call. So, you don't need the sleep here. Even if you would, 10 seconds is way too much.
    
    I would suggest that first you destroy the master router! The back becomes master in no time, so no sleep needed. Then you spin up a new router (which will be backup, and destroy the previous (old) master. Serious, no sleep needed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] cloudstack pull request: CLOUDSTACK-9114: restartnetwork with clea...

Posted by wilderrodrigues <gi...@git.apache.org>.
Github user wilderrodrigues commented on a diff in the pull request:

    https://github.com/apache/cloudstack/pull/1198#discussion_r47201591
  
    --- Diff: engine/orchestration/src/org/apache/cloudstack/engine/orchestration/NetworkOrchestrator.java ---
    @@ -2558,6 +2575,62 @@ public boolean restartNetwork(Long networkId, Account callerAccount, User caller
             }
         }
     
    +    /* If there are redundant routers in the isolated network, we follow the steps to make the network working better
    +     *  (1) destroy backup router if exists
    +     *  (2) create new backup router
    +     *  (3) destroy master router (then the backup will become master)
    +     *  (4) create a new router as backup router.
    +     */
    +    private boolean restartGuestNetworkWithRedundantRouters(NetworkVO network, List<DomainRouterVO> routers, ReservationContext context) throws ResourceUnavailableException, ConcurrentOperationException, InsufficientCapacityException {
    +        Account caller = CallContext.current().getCallingAccount();
    +        long callerUserId = CallContext.current().getCallingUserId();
    +
    +        // check the master and backup redundant state
    +        DomainRouterVO masterRouter = null;
    +        DomainRouterVO backupRouter = null;
    +        if (routers != null && routers.size() == 1) {
    +            masterRouter = routers.get(0);
    +        } if (routers != null && routers.size() == 2) {
    --- End diff --
    
    Avoid magic numbers by putting them in constants.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] cloudstack pull request: CLOUDSTACK-9114: restartnetwork with clea...

Posted by remibergsma <gi...@git.apache.org>.
Github user remibergsma commented on the pull request:

    https://github.com/apache/cloudstack/pull/1198#issuecomment-163329948
  
    Sounds like a great improvement. Will test it!
    
    Pinging @wilderrodrigues to have a look.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] cloudstack pull request: CLOUDSTACK-9114: restartnetwork with clea...

Posted by ustcweizhou <gi...@git.apache.org>.
Github user ustcweizhou commented on the pull request:

    https://github.com/apache/cloudstack/pull/1198#issuecomment-163875769
  
    @wilderrodrigues thanks!
    I am not in a hurry to merge it into 4.6 or master. Let's merge it after fully testing (scripts and functional test) and 2+ LGTM.
    
    I did not write integration test script and have marvin tests for long time, the last I remember is the integration test for advanced zone with security group (e6863c612bfeda957c4e2acf14647c029a48d2c8). maybe @DaanHoogland can help us.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] cloudstack pull request: CLOUDSTACK-9114: restartnetwork with clea...

Posted by ustcweizhou <gi...@git.apache.org>.
Github user ustcweizhou closed the pull request at:

    https://github.com/apache/cloudstack/pull/1198


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] cloudstack pull request: CLOUDSTACK-9114: restartnetwork with clea...

Posted by ustcweizhou <gi...@git.apache.org>.
Github user ustcweizhou commented on the pull request:

    https://github.com/apache/cloudstack/pull/1198#issuecomment-163540200
  
    @wilderrodrigues thanks for review!
    This PR apply on redundant networks, not redundant VR for VPC.
    I tested it several times. you may get a result from my first comment of this PR.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] cloudstack pull request: CLOUDSTACK-9114: restartnetwork with clea...

Posted by DaanHoogland <gi...@git.apache.org>.
Github user DaanHoogland commented on the pull request:

    https://github.com/apache/cloudstack/pull/1198#issuecomment-164186641
  
    I will look at improving the integration tests but can you do the improvements to the code as @wilderrodrigues suggested (or comment on them)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] cloudstack issue #1198: CLOUDSTACK-9114: restartnetwork with cleanup should ...

Posted by remibergsma <gi...@git.apache.org>.
Github user remibergsma commented on the issue:

    https://github.com/apache/cloudstack/pull/1198
  
    @ustcweizhou If you find some time to submit PRs for restartVPC+cleanup which does the router rebuild in sequence that would be awesome. Happy to help testing! Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] cloudstack pull request: CLOUDSTACK-9114: restartnetwork with clea...

Posted by wilderrodrigues <gi...@git.apache.org>.
Github user wilderrodrigues commented on the pull request:

    https://github.com/apache/cloudstack/pull/1198#issuecomment-163539499
  
    Hi @ustcweizhou,
    
    I made a couple of comments in your code that would need addressing. Could you have a look at that?
    
    In addition, would you mind improving the smoke/test_vpc_redundant.py in order to add a check for the improvements you added?
    
    Did you test this against RVR networks as well?
    
    Cheers,
    Wilder


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---