You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cloudstack.apache.org by harikrishna-patnala <gi...@git.apache.org> on 2015/12/07 07:55:24 UTC

[GitHub] cloudstack pull request: Deploy VM failing frequently due to capac...

GitHub user harikrishna-patnala opened a pull request:

    https://github.com/apache/cloudstack/pull/1180

    Deploy VM failing frequently due to capacity calculation not synchron…

    
    
    There are some VM deployment failures happening when multiple VMs are deployed at a time, failures mainly due to NetworkModel code that iterates over all the vlans in the pod. This causes each deployVM thread to hold the global lock on Network longer and cause delays. This delay in turn causes more threads to choose same host and fail since capacity is not available on that host.
    
    Following are some changes required to be done to reduce delays during VM deployments which in turn causes some vm deployment failures when multiple VMs are launched at a time.
    
      -  In Planner, remove the clusters that do not contain a host with matching service offering tag. This will save some iterations over clusters that dont have matching tagged host
      -  In NetworkModel, do not query the vlans for the pod within the loop. Also optimized the logic to query the ip/ipv6
      -  In DeploymentPlanningManagerImpl, do not process the affinity group if the plan has hostId provided.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/harikrishna-patnala/cloudstack CLOUDSTACK-9112

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/cloudstack/pull/1180.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1180
    
----
commit 8394e1940535b05525bebebdb8fcc73fcaefc709
Author: Harikrishna Patnala <ha...@citrix.com>
Date:   2015-12-04T09:57:53Z

    Deploy VM failing frequently due to capacity calculation not synchronized
    
    Changes:
    - In Planner, remove the clusters that do not contain a host with matching service offering tag. This will save some iterations over clusters that dont have matching tagged host
    - In NetworkModel, do not query the vlans for the pod within the loop. Also optimized the logic to query the ip/ipv6
    - In DeploymentPlanningManagerImpl, do not process the affinity group if the plan has hostId provided.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] cloudstack pull request: CLOUDSTACK-9112: Deploy VM failing freque...

Posted by remibergsma <gi...@git.apache.org>.
Github user remibergsma commented on the pull request:

    https://github.com/apache/cloudstack/pull/1180#issuecomment-162851856
  
    LGTM based on these tests:
    
    ```
    nosetests --with-marvin --marvin-config=${marvinCfg} -s -a tags=advanced,required_hardware=true \
    component/test_vpc_redundant.py \
    component/test_routers_iptables_default_policy.py \
    component/test_routers_network_ops.py \
    component/test_vpc_router_nics.py \
    smoke/test_loadbalance.py \
    smoke/test_internal_lb.py \
    smoke/test_ssvm.py \
    smoke/test_network.py
    
    ```
    
    Result:
    
    ```
    Test iptables default INPUT/FORWARD policies on VPC router ... === TestName: test_01_single_VPC_iptables_policies | Status : SUCCESS ===
    ok
    Test redundant router internals ... === TestName: test_01_isolate_network_FW_PF_default_routes_egress_true | Status : SUCCESS ===
    ok
    Test redundant router internals ... === TestName: test_02_isolate_network_FW_PF_default_routes_egress_false | Status : SUCCESS ===
    ok
    Test redundant router internals ... === TestName: test_01_RVR_Network_FW_PF_SSH_default_routes_egress_true | Status : SUCCESS ===
    ok
    Test redundant router internals ... === TestName: test_02_RVR_Network_FW_PF_SSH_default_routes_egress_false | Status : SUCCESS ===
    ok
    Create a VPC with two networks with one VM in each network and test nics after destroy ... === TestName: test_01_VPC_nics_after_destroy | Status : SUCCESS ===
    ok
    Create a VPC with two networks with one VM in each network and test default routes ... === TestName: test_02_VPC_default_routes | Status : SUCCESS ===
    ok
    Check the password file in the Router VM ... === TestName: test_isolate_network_password_server | Status : SUCCESS ===
    ok
    Check that the /etc/dhcphosts.txt doesn't contain duplicate IPs ... === TestName: test_router_dhcphosts | Status : SUCCESS ===
    ok
    Test to create Load balancing rule with source NAT ... === TestName: test_01_create_lb_rule_src_nat | Status : SUCCESS ===
    ok
    Test to create Load balancing rule with non source NAT ... === TestName: test_02_create_lb_rule_non_nat | Status : SUCCESS ===
    ok
    Test for assign & removing load balancing rule ... === TestName: test_assign_and_removal_lb | Status : SUCCESS ===
    ok
    Test to verify access to loadbalancer haproxy admin stats page ... === TestName: test02_internallb_haproxy_stats_on_all_interfaces | Status : SUCCESS ===
    ok
    Test create, assign, remove of an Internal LB with roundrobin http traffic to 3 vm's ... === TestName: test_01_internallb_roundrobin_1VPC_3VM_HTTP_port80 | Status : SUCCESS ===
    ok
    Test SSVM Internals ... === TestName: test_03_ssvm_internals | Status : SUCCESS ===
    ok
    Test CPVM Internals ... === TestName: test_04_cpvm_internals | Status : SUCCESS ===
    ok
    Test stop SSVM ... === TestName: test_05_stop_ssvm | Status : SUCCESS ===
    ok
    Test stop CPVM ... === TestName: test_06_stop_cpvm | Status : SUCCESS ===
    ok
    Test reboot SSVM ... === TestName: test_07_reboot_ssvm | Status : SUCCESS ===
    ok
    Test reboot CPVM ... === TestName: test_08_reboot_cpvm | Status : SUCCESS ===
    ok
    Test destroy SSVM ... === TestName: test_09_destroy_ssvm | Status : SUCCESS ===
    ok
    Test destroy CPVM ... === TestName: test_10_destroy_cpvm | Status : SUCCESS ===
    ok
    Test Remote Access VPN in VPC ... === TestName: test_vpc_remote_access_vpn | Status : SUCCESS ===
    ok
    Test VPN in VPC ... === TestName: test_vpc_site2site_vpn | Status : SUCCESS ===
    ok
    Test for port forwarding on source NAT ... === TestName: test_01_port_fwd_on_src_nat | Status : SUCCESS ===
    ok
    Test for port forwarding on non source NAT ... === TestName: test_02_port_fwd_on_non_src_nat | Status : SUCCESS ===
    ok
    Test for reboot router ... === TestName: test_reboot_router | Status : SUCCESS ===
    ok
    Test for Router rules for network rules on acquired public IP ... === TestName: test_network_rules_acquired_public_ip_1_static_nat_rule | Status : SUCCESS ===
    ok
    Test for Router rules for network rules on acquired public IP ... === TestName: test_network_rules_acquired_public_ip_2_nat_rule | Status : SUCCESS ===
    ok
    Test for Router rules for network rules on acquired public IP ... === TestName: test_network_rules_acquired_public_ip_3_Load_Balancer_Rule | Status : SUCCESS ===
    ok
    
    ----------------------------------------------------------------------
    Ran 33 tests in 17222.324s
    
    OK
    ```
    
    
    And:
    
    ```
    nosetests --with-marvin --marvin-config=${marvinCfg} -s -a tags=advanced,required_hardware=false \
    smoke/test_routers.py \
    smoke/test_network_acl.py \
    smoke/test_privategw_acl.py \
    smoke/test_reset_vm_on_reboot.py \
    smoke/test_vm_life_cycle.py \
    smoke/test_vpc_vpn.py \
    smoke/test_service_offerings.py \
    component/test_vpc_offerings.py \
    component/test_vpc_routers.py
    ```
    
    Result:
    
    ```
    Test router internal advanced zone ... === TestName: test_02_router_internal_adv | Status : SUCCESS ===
    ok
    Test restart network ... === TestName: test_03_restart_network_cleanup | Status : SUCCESS ===
    ok
    Test router basic setup ... === TestName: test_05_router_basic | Status : SUCCESS ===
    ok
    Test router advanced setup ... === TestName: test_06_router_advanced | Status : SUCCESS ===
    ok
    Test stop router ... === TestName: test_07_stop_router | Status : SUCCESS ===
    ok
    Test start router ... === TestName: test_08_start_router | Status : SUCCESS ===
    ok
    Test reboot router ... === TestName: test_09_reboot_router | Status : SUCCESS ===
    ok
    test_privategw_acl (integration.smoke.test_privategw_acl.TestPrivateGwACL) ... === TestName: test_privategw_acl | Status : SUCCESS ===
    ok
    Test reset virtual machine on reboot ... === TestName: test_01_reset_vm_on_reboot | Status : SUCCESS ===
    ok
    Test advanced zone virtual router ... === TestName: test_advZoneVirtualRouter | Status : SUCCESS ===
    ok
    Test Deploy Virtual Machine ... === TestName: test_deploy_vm | Status : SUCCESS ===
    ok
    Test Multiple Deploy Virtual Machine ... === TestName: test_deploy_vm_multiple | Status : SUCCESS ===
    ok
    Test Stop Virtual Machine ... === TestName: test_01_stop_vm | Status : SUCCESS ===
    ok
    Test Start Virtual Machine ... === TestName: test_02_start_vm | Status : SUCCESS ===
    ok
    Test Reboot Virtual Machine ... === TestName: test_03_reboot_vm | Status : SUCCESS ===
    ok
    Test destroy Virtual Machine ... === TestName: test_06_destroy_vm | Status : SUCCESS ===
    ok
    Test recover Virtual Machine ... === TestName: test_07_restore_vm | Status : SUCCESS ===
    ok
    Test migrate VM ... === TestName: test_08_migrate_vm | Status : SUCCESS ===
    ok
    Test destroy(expunge) Virtual Machine ... === TestName: test_09_expunge_vm | Status : SUCCESS ===
    ok
    Test to create service offering ... === TestName: test_01_create_service_offering | Status : SUCCESS ===
    ok
    Test to update existing service offering ... === TestName: test_02_edit_service_offering | Status : SUCCESS ===
    ok
    Test to delete service offering ... === TestName: test_03_delete_service_offering | Status : SUCCESS ===
    ok
    Test for delete account ... === TestName: test_delete_account | Status : SUCCESS ===
    ok
    Test for Associate/Disassociate public IP address for admin account ... === TestName: test_public_ip_admin_account | Status : SUCCESS ===
    ok
    Test for Associate/Disassociate public IP address for user account ... === TestName: test_public_ip_user_account | Status : SUCCESS ===
    ok
    Test for release public IP address ... === TestName: test_releaseIP | Status : SUCCESS ===
    ok
    Test create VPC offering ... === TestName: test_01_create_vpc_offering | Status : SUCCESS ===
    ok
    Test VPC offering without load balancing service ... === TestName: test_03_vpc_off_without_lb | Status : SUCCESS ===
    ok
    Test VPC offering without static NAT service ... === TestName: test_04_vpc_off_without_static_nat | Status : SUCCESS ===
    ok
    Test VPC offering without port forwarding service ... === TestName: test_05_vpc_off_without_pf | Status : SUCCESS ===
    ok
    Test VPC offering with invalid services ... === TestName: test_06_vpc_off_invalid_services | Status : SUCCESS ===
    ok
    Test update VPC offering ... === TestName: test_07_update_vpc_off | Status : SUCCESS ===
    ok
    Test list VPC offering ... === TestName: test_08_list_vpc_off | Status : SUCCESS ===
    test_09_create_redundant_vpc_offering (integration.component.test_vpc_offerings.TestVPCOffering) ... === TestName: test_09_create_redundant_vpc_offering | Status : SUCCESS ===
    ok
    Test start/stop of router after addition of one guest network ... === TestName: test_01_start_stop_router_after_addition_of_one_guest_network | Status : SUCCESS ===
    ok
    Test reboot of router after addition of one guest network ... === TestName: test_02_reboot_router_after_addition_of_one_guest_network | Status : SUCCESS ===
    ok
    Test to change service offering of router after addition of one guest network ... === TestName: test_04_chg_srv_off_router_after_addition_of_one_guest_network | Status : SUCCESS ===
    ok
    Test destroy of router after addition of one guest network ... === TestName: test_05_destroy_router_after_addition_of_one_guest_network | Status : SUCCESS ===
    ok
    Test to stop and start router after creation of VPC ... === TestName: test_01_stop_start_router_after_creating_vpc | Status : SUCCESS ===
    ok
    Test to reboot the router after creating a VPC ... === TestName: test_02_reboot_router_after_creating_vpc | Status : SUCCESS ===
    ok
    Tests to change service offering of the Router after ... === TestName: test_04_change_service_offerring_vpc | Status : SUCCESS ===
    ok
    Test to destroy the router after creating a VPC ... === TestName: test_05_destroy_router_after_creating_vpc | Status : SUCCESS ===
    ok
    
    ----------------------------------------------------------------------
    Ran 42 tests in 8486.536s
    
    OK
    ```
    
    Haven't tested the actual functionality, this just shows you didn't break anything else.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] cloudstack issue #1180: CLOUDSTACK-9112: Deploy VM failing frequently due to...

Posted by cloudmonger <gi...@git.apache.org>.
Github user cloudmonger commented on the issue:

    https://github.com/apache/cloudstack/pull/1180
  
    ### ACS CI BVT Run
     **Sumarry:**
     Build Number 420
     Hypervisor xenserver
     NetworkType Advanced
     Passed=102
     Failed=3
     Skipped=7
    
    _Link to logs Folder (search by build_no):_ https://www.dropbox.com/sh/yj3wnzbceo9uef2/AAB6u-Iap-xztdm6jHX9SjPja?dl=0
    
    
    **Failed tests:**
    * test_routers_network_ops.py
    
     * test_02_isolate_network_FW_PF_default_routes_egress_false Failed
    
     * test_01_RVR_Network_FW_PF_SSH_default_routes_egress_true Failed
    
     * test_02_RVR_Network_FW_PF_SSH_default_routes_egress_false Failing since 2 runs
    
    
    **Skipped tests:**
    test_01_test_vm_volume_snapshot
    test_vm_nic_adapter_vmxnet3
    test_static_role_account_acls
    test_11_ss_nfs_version_on_ssvm
    test_nested_virtualization_vmware
    test_3d_gpu_support
    test_deploy_vgpu_enabled_vm
    
    **Passed test suits:**
    test_deploy_vm_with_userdata.py
    test_affinity_groups_projects.py
    test_portable_publicip.py
    test_over_provisioning.py
    test_global_settings.py
    test_scale_vm.py
    test_service_offerings.py
    test_routers_iptables_default_policy.py
    test_loadbalance.py
    test_routers.py
    test_reset_vm_on_reboot.py
    test_deploy_vms_with_varied_deploymentplanners.py
    test_network.py
    test_router_dns.py
    test_non_contigiousvlan.py
    test_login.py
    test_deploy_vm_iso.py
    test_list_ids_parameter.py
    test_public_ip_range.py
    test_multipleips_per_nic.py
    test_regions.py
    test_affinity_groups.py
    test_network_acl.py
    test_pvlan.py
    test_volumes.py
    test_nic.py
    test_deploy_vm_root_resize.py
    test_resource_detail.py
    test_secondary_storage.py
    test_vm_life_cycle.py
    test_disk_offerings.py


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] cloudstack issue #1180: CLOUDSTACK-9112: Deploy VM failing frequently due to...

Posted by harikrishna-patnala <gi...@git.apache.org>.
Github user harikrishna-patnala commented on the issue:

    https://github.com/apache/cloudstack/pull/1180
  
    Added test for the case where we need to list only clusters which has hosts with host tag specified in the offering.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] cloudstack pull request: CLOUDSTACK-9112: Deploy VM failing freque...

Posted by rhtyd <gi...@git.apache.org>.
Github user rhtyd commented on the pull request:

    https://github.com/apache/cloudstack/pull/1180#issuecomment-216206438
  
    @harikrishna-patnala can you rebase against latest master, update on tests etc and share state of your PR, thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] cloudstack issue #1180: CLOUDSTACK-9112: Deploy VM failing frequently due to...

Posted by cloudmonger <gi...@git.apache.org>.
Github user cloudmonger commented on the issue:

    https://github.com/apache/cloudstack/pull/1180
  
    ### ACS CI BVT Run
     **Sumarry:**
     Build Number 449
     Hypervisor xenserver
     NetworkType Advanced
     Passed=105
     Failed=0
     Skipped=7
    
    _Link to logs Folder (search by build_no):_ https://www.dropbox.com/sh/yj3wnzbceo9uef2/AAB6u-Iap-xztdm6jHX9SjPja?dl=0
    
    
    **Failed tests:**
    
    **Skipped tests:**
    test_01_test_vm_volume_snapshot
    test_vm_nic_adapter_vmxnet3
    test_static_role_account_acls
    test_11_ss_nfs_version_on_ssvm
    test_nested_virtualization_vmware
    test_3d_gpu_support
    test_deploy_vgpu_enabled_vm
    
    **Passed test suits:**
    test_deploy_vm_with_userdata.py
    test_affinity_groups_projects.py
    test_portable_publicip.py
    test_over_provisioning.py
    test_global_settings.py
    test_scale_vm.py
    test_service_offerings.py
    test_routers_iptables_default_policy.py
    test_loadbalance.py
    test_routers.py
    test_reset_vm_on_reboot.py
    test_deploy_vms_with_varied_deploymentplanners.py
    test_network.py
    test_router_dns.py
    test_non_contigiousvlan.py
    test_login.py
    test_deploy_vm_iso.py
    test_list_ids_parameter.py
    test_public_ip_range.py
    test_multipleips_per_nic.py
    test_regions.py
    test_affinity_groups.py
    test_network_acl.py
    test_pvlan.py
    test_volumes.py
    test_nic.py
    test_deploy_vm_root_resize.py
    test_resource_detail.py
    test_secondary_storage.py
    test_vm_life_cycle.py
    test_routers_network_ops.py
    test_disk_offerings.py


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] cloudstack pull request: CLOUDSTACK-9112: Deploy VM failing freque...

Posted by wilderrodrigues <gi...@git.apache.org>.
Github user wilderrodrigues commented on the pull request:

    https://github.com/apache/cloudstack/pull/1180#issuecomment-162821748
  
    Hi @harikrishna-patnala 
    
    Anyway we can test this? Is there any integration test that we can use to cover your changes? Concerning the new methods you introduced, any chance to get some unit tests to cover the logic?
    
    Could you describe which steps you followed to test those changes? Would be nice to come up with some strategy that will help reviewers to test your changes.
    
    Cheers,
    Wilder


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] cloudstack pull request: CLOUDSTACK-9112: Deploy VM failing freque...

Posted by harikrishna-patnala <gi...@git.apache.org>.
Github user harikrishna-patnala commented on the pull request:

    https://github.com/apache/cloudstack/pull/1180#issuecomment-163112385
  
    Thank you for the feedback, I'll add the unit tests for the new methods and integration tests for the changes that I have made.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] cloudstack pull request: CLOUDSTACK-9112: Deploy VM failing freque...

Posted by wilderrodrigues <gi...@git.apache.org>.
Github user wilderrodrigues commented on the pull request:

    https://github.com/apache/cloudstack/pull/1180#issuecomment-162994057
  
    @remibergsma 
    
    It doesn't break things covered by the tests that were execute, which is fine. However, @harikrishna-patnala hasn't provided any support in terms of how to test it; if there is any integration test to help; or about the unit tests for the new methods he introduced.
    
    I'm not very keen to LGTM a PR without support from the author himself. Unless he comes back with some more details, I will hold a :-1:  for this PR.
    
    Cheers,
    Wilder


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---