You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cloudstack.apache.org by Koushik Das <ko...@citrix.com> on 2013/02/01 10:25:35 UTC

Re: Review Request: CLOUDSTACK-606: Starting VM fails with 'ConcurrentOperationException' in a clustered MS scenario

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9133/
-----------------------------------------------------------

(Updated Feb. 1, 2013, 9:25 a.m.)


Review request for cloudstack, Abhinandan Prateek and Alex Huang.


Changes
-------

Updated exception message as the fix is not merged yet.


Description
-------

The issue happens randomly when hosts in a cluster gets distributed across multiple MS. Host can get split in following scenarios:
    a. Add host – MS on which add host is executed takes ownership of the host. So if 2 hosts belonging to same cluster are added from 2 different MS then cluster gets split
    b. scanDirectAgentToLoad – This runs every 90 secs. and check if there are any hosts that needs to be reconnected. The current logic of host scan can also lead to a split
    
    The idea is to fix (b) to ensure that hosts in a cluster are managed by same MS. For (a) only the entry in the database is going to be created except in case if the host getting added is first in the cluster (in this case agent creation happens at the same time) and then (b) will take care of connection and agent creation part. Since currently addHost only creates an entry in the db there is a small window where the host state will be shown as 'Alert' till the time (b) is scheduled and picks up the host to make a connection. The MS doing add host will immediately schedule a scan task and also send notification to peers to start the scan task.


This addresses bug CLOUDSTACK-606.


Diffs (updated)
-----

  api/src/com/cloud/agent/api/ScheduleHostScanTaskCommand.java PRE-CREATION 
  server/src/com/cloud/agent/manager/ClusteredAgentManagerImpl.java ca0bf5c 
  server/src/com/cloud/cluster/ClusterManagerImpl.java e341b88 
  server/src/com/cloud/host/dao/HostDaoImpl.java 0881675 
  server/src/com/cloud/resource/ResourceManagerImpl.java f82424a 

Diff: https://reviews.apache.org/r/9133/diff/


Testing
-------

Manually tested the following scenarios:

- Added hostA in cluster1 from MS1, gets owned by MS1 as first host in cluster. Added hostB in same cluster1 from MS2. Once both hosts are in 'Up' state ensure that they are owned by the same MS (i.e. MS1).
- Error scenarios when host goes to disconnected, alert or down state (disconnected host from network) and is reconnected back (connected to network). Ensure that once connected back, host should be owned by same MS as other hosts in the cluster.
- Have a scenario where hosts are already in a distributed state (before the fix added hosts to the same cluster from different MSs) and ensure that after applying the patch and retarting the MSs distribution happens properly.
- Did basic validation in a single MS setup, added multiple hosts in a cluster and created VMs on them.


Thanks,

Koushik Das


Re: Review Request: CLOUDSTACK-606: Starting VM fails with 'ConcurrentOperationException' in a clustered MS scenario

Posted by Nitin Mehta <Ni...@citrix.com>.
Submitted with 
commit 777147ce8a47238125a5439f207c225aa9db5304
Author: Koushik Das <ko...@citrix.com>
Date:   Fri Feb 1 15:34:41 2013 +0530



On 01/02/13 2:55 PM, "Koushik Das" <ko...@citrix.com> wrote:

>
>-----------------------------------------------------------
>This is an automatically generated e-mail. To reply, visit:
>https://reviews.apache.org/r/9133/
>-----------------------------------------------------------
>
>(Updated Feb. 1, 2013, 9:25 a.m.)
>
>
>Review request for cloudstack, Abhinandan Prateek and Alex Huang.
>
>
>Changes
>-------
>
>Updated exception message as the fix is not merged yet.
>
>
>Description
>-------
>
>The issue happens randomly when hosts in a cluster gets distributed
>across multiple MS. Host can get split in following scenarios:
>    a. Add host ­ MS on which add host is executed takes ownership of the
>host. So if 2 hosts belonging to same cluster are added from 2 different
>MS then cluster gets split
>    b. scanDirectAgentToLoad ­ This runs every 90 secs. and check if
>there are any hosts that needs to be reconnected. The current logic of
>host scan can also lead to a split
>    
>    The idea is to fix (b) to ensure that hosts in a cluster are managed
>by same MS. For (a) only the entry in the database is going to be created
>except in case if the host getting added is first in the cluster (in this
>case agent creation happens at the same time) and then (b) will take care
>of connection and agent creation part. Since currently addHost only
>creates an entry in the db there is a small window where the host state
>will be shown as 'Alert' till the time (b) is scheduled and picks up the
>host to make a connection. The MS doing add host will immediately
>schedule a scan task and also send notification to peers to start the
>scan task.
>
>
>This addresses bug CLOUDSTACK-606.
>
>
>Diffs (updated)
>-----
>
>  api/src/com/cloud/agent/api/ScheduleHostScanTaskCommand.java
>PRE-CREATION 
>  server/src/com/cloud/agent/manager/ClusteredAgentManagerImpl.java
>ca0bf5c 
>  server/src/com/cloud/cluster/ClusterManagerImpl.java e341b88
>  server/src/com/cloud/host/dao/HostDaoImpl.java 0881675
>  server/src/com/cloud/resource/ResourceManagerImpl.java f82424a
>
>Diff: https://reviews.apache.org/r/9133/diff/
>
>
>Testing
>-------
>
>Manually tested the following scenarios:
>
>- Added hostA in cluster1 from MS1, gets owned by MS1 as first host in
>cluster. Added hostB in same cluster1 from MS2. Once both hosts are in
>'Up' state ensure that they are owned by the same MS (i.e. MS1).
>- Error scenarios when host goes to disconnected, alert or down state
>(disconnected host from network) and is reconnected back (connected to
>network). Ensure that once connected back, host should be owned by same
>MS as other hosts in the cluster.
>- Have a scenario where hosts are already in a distributed state (before
>the fix added hosts to the same cluster from different MSs) and ensure
>that after applying the patch and retarting the MSs distribution happens
>properly.
>- Did basic validation in a single MS setup, added multiple hosts in a
>cluster and created VMs on them.
>
>
>Thanks,
>
>Koushik Das
>