You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cloudstack.apache.org by Koushik Das <ko...@citrix.com> on 2013/10/15 18:19:24 UTC

Possible bug in DeploymentPlanner?

I was making some changes in the storage pool allocators related to some bug fix and came across this code snippet in planDeplyment() method of DeploymentPlanningManagerImpl.java.
In this if the checkClustersforDestination() returns null and the 'avoids' parameter is not correctly updated (one such place can be the storage allocators) then the while loop will never terminate. Is there any  assumption about how the 'avoids' parameter needs to be updated? From the code it is not very intuitive. I saw some places in the storage pool allocators where this will not get updated.

Wanted to understand the reason for doing it this way? Can the while (true) condition be replaced with something more intuitive?

            while (true) {
                if (planner instanceof DeploymentClusterPlanner) {
                    ExcludeList plannerAvoidInput = new ExcludeList(avoids.getDataCentersToAvoid(),
                            avoids.getPodsToAvoid(), avoids.getClustersToAvoid(), avoids.getHostsToAvoid(),
                            avoids.getPoolsToAvoid());

                    clusterList = ((DeploymentClusterPlanner) planner).orderClusters(vmProfile, plan, avoids);
                    if (clusterList != null && !clusterList.isEmpty()) {
                        // planner refactoring. call allocators to list hosts
                        ExcludeList plannerAvoidOutput = new ExcludeList(avoids.getDataCentersToAvoid(),
                                avoids.getPodsToAvoid(), avoids.getClustersToAvoid(), avoids.getHostsToAvoid(),
                                avoids.getPoolsToAvoid());

                        resetAvoidSet(plannerAvoidOutput, plannerAvoidInput);

                        dest = checkClustersforDestination(clusterList, vmProfile, plan, avoids, dc,
                                getPlannerUsage(planner, vmProfile, plan, avoids), plannerAvoidOutput);
                        if (dest != null) {
                            return dest;
                        }
                        // reset the avoid input to the planners
                        resetAvoidSet(avoids, plannerAvoidOutput);

                    } else {
                        return null;
                    }
                } else {
…………
…………
                }
            }







RE: Possible bug in DeploymentPlanner?

Posted by Prachi Damle <Pr...@citrix.com>.
-----Original Message-----
From: Koushik Das [mailto:koushik.das@citrix.com] 
Sent: Thursday, October 17, 2013 11:50 AM
To: <de...@cloudstack.apache.org>
Subject: Re: Possible bug in DeploymentPlanner?


On 17-Oct-2013, at 11:29 PM, Prachi Damle <Pr...@citrix.com>
 wrote:

> Koushik,
> 
> I don't think you should add this check before calling allocators. It is the allocators output that should tell that local storage pools were not found. 
> For your case where you need to put cluster in avoid set when no local pool is present:
> a) Only local storage allocator should get invoked, all other allocators need to have a check at the start saying they can process only 'shared' storage request - This will make sure you do not get any pools from any other allocator for your local storage request.

[Koushik] I made the changes for this and after this the planner bug got uncovered where it went in a infinite loop trying to deploy VM with local disk when no local pool is available.
 
> b) canAvoidCluster() need to be changed to compare only 'local' or 'shared' pool set to the avoid output of the allocators to see if all pools are in avoid set - Or if there are no pools then  directly avoid the cluster.

[Koushik] Ok. I will make the necessary changes for this but this closely ties the allocators and the assumption in canAvoidCluster(). The scenario I am seeing is the latter where there is no pools in the allocator avoid set. So as you suggested in this case as well the cluster needs to be in avoid set.

[Prachi] Cool thanks Koushik, this way all the logic to figure out a cluster should be avoided gets handled by canAvoidCluster(), and not in various places. 
 
> Prachi
> 
> -----Original Message-----
> From: Koushik Das [mailto:koushik.das@citrix.com] 
> Sent: Thursday, October 17, 2013 9:23 AM
> To: <de...@cloudstack.apache.org>
> Subject: Re: Possible bug in DeploymentPlanner?
> 
> Prachi,
> 
> I am planning to add the following code in findSuitablePoolsForVolumes() method. If the VM has a local disk and the cluster doesn't have a local pool then the cluster will be added to avoid set. Let me know if you see any issues with this.
> 
>            if (useLocalStorage) {
>                List<StoragePoolVO> allPoolsInCluster = _storagePoolDao.findPoolsByTags(plan.getDataCenterId(), plan.getPodId(), plan.getClusterId(), null);
>                boolean localStoragePoolExists = false;
>                for (StoragePoolVO pool : allPoolsInCluster) {
>                    if (pool.isLocal()) {
>                        localStoragePoolExists = true;
>                        break;
>                    }
>                }
>                if (!localStoragePoolExists) {
>                    avoid.addCluster(plan.getClusterId());
>                }
>            }
> 
> -Koushik
> 
> On 17-Oct-2013, at 3:21 AM, Prachi Damle <Pr...@citrix.com>> wrote:
> 
> 
> 
> -----Original Message-----
> From: Koushik Das [mailto:koushik.das@citrix.com<http://citrix.com>]
> Sent: Wednesday, October 16, 2013 5:21 AM
> To: <de...@cloudstack.apache.org>>
> Subject: Re: Possible bug in DeploymentPlanner?
> 
> 
> On 16-Oct-2013, at 3:12 AM, Prachi Damle <Pr...@citrix.com>> wrote:
> 
> 
> -----Original Message-----
> From: Koushik Das [mailto:koushik.das@citrix.com<http://citrix.com>]
> Sent: Tuesday, October 15, 2013 11:43 AM
> To: <de...@cloudstack.apache.org>>
> Subject: Re: Possible bug in DeploymentPlanner?
> 
> 
> Thanks for the explanation Prachi.
> 
> Here is one issue I see with what you explained below. If an allocator is not suppose to handle a particular scenario how should it deal with the avoid set. Should it put all available pools in the avoid set. In that case subsequent allocators won't get a chance to do anything as all pools are in avoid set. If more than one allocator can operate on a particular pool then the situation becomes tricky.
> 
> To give an example, if there is a request for volume on shared storage and first allocator to get called is the local allocator. Since it is not going to handle this case it will return empty/null list. Now should it put all the other shared pools in avoid set or only the local pools in avoid set? Similarly if the request is for a local volume and the request first goes to zone or cluster-wide storage allocator what should be there in the avoid set?
> 
> 
> [Prachi] I see the bug you are thinking about - deploymentPlanningManager :: canAvoidCluster  is only checking the shared storages and thus does not put the cluster in avoid set in case the local storage is not a match for the host.
> 
> I think the storage allocators should put the resources not considered, in avoid set, that are within their scope. Also they should do so only when they are supposed to handle the request. Thus local allocator sees only local pools, so it should place them into avoid set if not considered and only when this allocator is supposed to handle the scenario. While the zone-wide/cluster-wide should set those within their scope.
> Also, the allocators have a defined order in which they get invoked and we need to maintain the order defined in the componentContext.xml
> 
> Now to solve the bug, the deploymentPlanningManager :: canAvoidCluster  needs to consider which scenario applies for the given vmprofile - i.e whether its local storage or shared to correctly figure out if cluster can be avoided by considering applicable pools only.
> Does this work fine for your scenario?
> 
> 
> [Koushik] Not sure if canAvoidCluster() is the right place. Shared and local pools can belong to the same cluster, so the cluster cannot be put in avoid set. Only the pools needs to be selectively added to avoid set.
> 
> [Prachi] Allocators add  the resources to avoid set and then DPM can figure out if a cluster is done searching looking at the avoid set, so that the planner need not choose that cluster on next try.
> So two changes are needed: Storage Allocators should set resources in avoid set within their scope and DPM should  check if a cluster is done accordingly.
> Can you let me know the specific bug you want to change allocators for? I will add the above change in a separate ticket if already not done for your bug.
> 
> 
> On 15-Oct-2013, at 11:31 PM, Prachi Damle <Pr...@citrix.com>>
> wrote:
> 
> Koushik,
> 
> The deployment planning process is divided between DeploymentPlanners and Allocators. The planners are supposed to feed a list of clusters the allocators should look into to find a potential destination. While the allocators should provide a avoid set back to the planners consisting of any resource not considered.
> 
> Reasoning how the planners work:
> Planners can set/reset clusters into avoid set depending on the heuristics they implement and the order in which they want the clusters to be traversed.
> When all options of the planner are exhausted they should return an empty cluster list halting the search.
> 
> What should the allocators do?
> Iterate over the given cluster list and in each cluster find out suitable resources - all other resources not considered AND not suitable MUST be added to the avoid set so that the planners get the correct avoid input.
> This is necessary for the logic to not enter an infinite loop.
> 
> As you see, only planners can halt the search process by referring to the avoid set provided by the allocators. If you see any case not following this, that needs to be fixed.
> 
> Also, I do think in general it will be safe to add a configurable retry limit on this loop to have control in case any new logic in allocators don't follow this reasoning. I will add that limit.
> 
> Thanks,
> Prachi
> 
> 
> 
> -----Original Message-----
> From: Koushik Das [mailto:koushik.das@citrix.com<http://citrix.com>]
> Sent: Tuesday, October 15, 2013 9:19 AM
> To: <de...@cloudstack.apache.org>>
> Subject: Possible bug in DeploymentPlanner?
> 
> I was making some changes in the storage pool allocators related to some bug fix and came across this code snippet in planDeplyment() method of DeploymentPlanningManagerImpl.java.
> In this if the checkClustersforDestination() returns null and the 'avoids' parameter is not correctly updated (one such place can be the storage allocators) then the while loop will never terminate. Is there any  assumption about how the 'avoids' parameter needs to be updated? From the code it is not very intuitive. I saw some places in the storage pool allocators where this will not get updated.
> 
> Wanted to understand the reason for doing it this way? Can the while (true) condition be replaced with something more intuitive?
> 
>         while (true) {
>             if (planner instanceof DeploymentClusterPlanner) {
>                 ExcludeList plannerAvoidInput = new ExcludeList(avoids.getDataCentersToAvoid(),
>                         avoids.getPodsToAvoid(), avoids.getClustersToAvoid(), avoids.getHostsToAvoid(),
>                         avoids.getPoolsToAvoid());
> 
>                 clusterList = ((DeploymentClusterPlanner) planner).orderClusters(vmProfile, plan, avoids);
>                 if (clusterList != null && !clusterList.isEmpty()) {
>                     // planner refactoring. call allocators to list hosts
>                     ExcludeList plannerAvoidOutput = new ExcludeList(avoids.getDataCentersToAvoid(),
>                             avoids.getPodsToAvoid(), avoids.getClustersToAvoid(), avoids.getHostsToAvoid(),
>                             avoids.getPoolsToAvoid());
> 
>                     resetAvoidSet(plannerAvoidOutput, plannerAvoidInput);
> 
>                     dest = checkClustersforDestination(clusterList, vmProfile, plan, avoids, dc,
>                             getPlannerUsage(planner, vmProfile, plan, avoids), plannerAvoidOutput);
>                     if (dest != null) {
>                         return dest;
>                     }
>                     // reset the avoid input to the planners
>                     resetAvoidSet(avoids, plannerAvoidOutput);
> 
>                 } else {
>                     return null;
>                 }
>             } else {
> ............
> ............
>             }
>         }
> 
> 
> 
> 
> 
> 
> 
> 
> 


Re: Possible bug in DeploymentPlanner?

Posted by Koushik Das <ko...@citrix.com>.
On 17-Oct-2013, at 11:29 PM, Prachi Damle <Pr...@citrix.com>
 wrote:

> Koushik,
> 
> I don't think you should add this check before calling allocators. It is the allocators output that should tell that local storage pools were not found. 
> For your case where you need to put cluster in avoid set when no local pool is present:
> a) Only local storage allocator should get invoked, all other allocators need to have a check at the start saying they can process only 'shared' storage request - This will make sure you do not get any pools from any other allocator for your local storage request.

[Koushik] I made the changes for this and after this the planner bug got uncovered where it went in a infinite loop trying to deploy VM with local disk when no local pool is available.
 
> b) canAvoidCluster() need to be changed to compare only 'local' or 'shared' pool set to the avoid output of the allocators to see if all pools are in avoid set - Or if there are no pools then  directly avoid the cluster.

[Koushik] Ok. I will make the necessary changes for this but this closely ties the allocators and the assumption in canAvoidCluster(). The scenario I am seeing is the latter where there is no pools in the allocator avoid set. So as you suggested in this case as well the cluster needs to be in avoid set.

> 
> Prachi
> 
> -----Original Message-----
> From: Koushik Das [mailto:koushik.das@citrix.com] 
> Sent: Thursday, October 17, 2013 9:23 AM
> To: <de...@cloudstack.apache.org>
> Subject: Re: Possible bug in DeploymentPlanner?
> 
> Prachi,
> 
> I am planning to add the following code in findSuitablePoolsForVolumes() method. If the VM has a local disk and the cluster doesn't have a local pool then the cluster will be added to avoid set. Let me know if you see any issues with this.
> 
>            if (useLocalStorage) {
>                List<StoragePoolVO> allPoolsInCluster = _storagePoolDao.findPoolsByTags(plan.getDataCenterId(), plan.getPodId(), plan.getClusterId(), null);
>                boolean localStoragePoolExists = false;
>                for (StoragePoolVO pool : allPoolsInCluster) {
>                    if (pool.isLocal()) {
>                        localStoragePoolExists = true;
>                        break;
>                    }
>                }
>                if (!localStoragePoolExists) {
>                    avoid.addCluster(plan.getClusterId());
>                }
>            }
> 
> -Koushik
> 
> On 17-Oct-2013, at 3:21 AM, Prachi Damle <Pr...@citrix.com>> wrote:
> 
> 
> 
> -----Original Message-----
> From: Koushik Das [mailto:koushik.das@citrix.com<http://citrix.com>]
> Sent: Wednesday, October 16, 2013 5:21 AM
> To: <de...@cloudstack.apache.org>>
> Subject: Re: Possible bug in DeploymentPlanner?
> 
> 
> On 16-Oct-2013, at 3:12 AM, Prachi Damle <Pr...@citrix.com>> wrote:
> 
> 
> -----Original Message-----
> From: Koushik Das [mailto:koushik.das@citrix.com<http://citrix.com>]
> Sent: Tuesday, October 15, 2013 11:43 AM
> To: <de...@cloudstack.apache.org>>
> Subject: Re: Possible bug in DeploymentPlanner?
> 
> 
> Thanks for the explanation Prachi.
> 
> Here is one issue I see with what you explained below. If an allocator is not suppose to handle a particular scenario how should it deal with the avoid set. Should it put all available pools in the avoid set. In that case subsequent allocators won't get a chance to do anything as all pools are in avoid set. If more than one allocator can operate on a particular pool then the situation becomes tricky.
> 
> To give an example, if there is a request for volume on shared storage and first allocator to get called is the local allocator. Since it is not going to handle this case it will return empty/null list. Now should it put all the other shared pools in avoid set or only the local pools in avoid set? Similarly if the request is for a local volume and the request first goes to zone or cluster-wide storage allocator what should be there in the avoid set?
> 
> 
> [Prachi] I see the bug you are thinking about - deploymentPlanningManager :: canAvoidCluster  is only checking the shared storages and thus does not put the cluster in avoid set in case the local storage is not a match for the host.
> 
> I think the storage allocators should put the resources not considered, in avoid set, that are within their scope. Also they should do so only when they are supposed to handle the request. Thus local allocator sees only local pools, so it should place them into avoid set if not considered and only when this allocator is supposed to handle the scenario. While the zone-wide/cluster-wide should set those within their scope.
> Also, the allocators have a defined order in which they get invoked and we need to maintain the order defined in the componentContext.xml
> 
> Now to solve the bug, the deploymentPlanningManager :: canAvoidCluster  needs to consider which scenario applies for the given vmprofile - i.e whether its local storage or shared to correctly figure out if cluster can be avoided by considering applicable pools only.
> Does this work fine for your scenario?
> 
> 
> [Koushik] Not sure if canAvoidCluster() is the right place. Shared and local pools can belong to the same cluster, so the cluster cannot be put in avoid set. Only the pools needs to be selectively added to avoid set.
> 
> [Prachi] Allocators add  the resources to avoid set and then DPM can figure out if a cluster is done searching looking at the avoid set, so that the planner need not choose that cluster on next try.
> So two changes are needed: Storage Allocators should set resources in avoid set within their scope and DPM should  check if a cluster is done accordingly.
> Can you let me know the specific bug you want to change allocators for? I will add the above change in a separate ticket if already not done for your bug.
> 
> 
> On 15-Oct-2013, at 11:31 PM, Prachi Damle <Pr...@citrix.com>>
> wrote:
> 
> Koushik,
> 
> The deployment planning process is divided between DeploymentPlanners and Allocators. The planners are supposed to feed a list of clusters the allocators should look into to find a potential destination. While the allocators should provide a avoid set back to the planners consisting of any resource not considered.
> 
> Reasoning how the planners work:
> Planners can set/reset clusters into avoid set depending on the heuristics they implement and the order in which they want the clusters to be traversed.
> When all options of the planner are exhausted they should return an empty cluster list halting the search.
> 
> What should the allocators do?
> Iterate over the given cluster list and in each cluster find out suitable resources - all other resources not considered AND not suitable MUST be added to the avoid set so that the planners get the correct avoid input.
> This is necessary for the logic to not enter an infinite loop.
> 
> As you see, only planners can halt the search process by referring to the avoid set provided by the allocators. If you see any case not following this, that needs to be fixed.
> 
> Also, I do think in general it will be safe to add a configurable retry limit on this loop to have control in case any new logic in allocators don't follow this reasoning. I will add that limit.
> 
> Thanks,
> Prachi
> 
> 
> 
> -----Original Message-----
> From: Koushik Das [mailto:koushik.das@citrix.com<http://citrix.com>]
> Sent: Tuesday, October 15, 2013 9:19 AM
> To: <de...@cloudstack.apache.org>>
> Subject: Possible bug in DeploymentPlanner?
> 
> I was making some changes in the storage pool allocators related to some bug fix and came across this code snippet in planDeplyment() method of DeploymentPlanningManagerImpl.java.
> In this if the checkClustersforDestination() returns null and the 'avoids' parameter is not correctly updated (one such place can be the storage allocators) then the while loop will never terminate. Is there any  assumption about how the 'avoids' parameter needs to be updated? From the code it is not very intuitive. I saw some places in the storage pool allocators where this will not get updated.
> 
> Wanted to understand the reason for doing it this way? Can the while (true) condition be replaced with something more intuitive?
> 
>         while (true) {
>             if (planner instanceof DeploymentClusterPlanner) {
>                 ExcludeList plannerAvoidInput = new ExcludeList(avoids.getDataCentersToAvoid(),
>                         avoids.getPodsToAvoid(), avoids.getClustersToAvoid(), avoids.getHostsToAvoid(),
>                         avoids.getPoolsToAvoid());
> 
>                 clusterList = ((DeploymentClusterPlanner) planner).orderClusters(vmProfile, plan, avoids);
>                 if (clusterList != null && !clusterList.isEmpty()) {
>                     // planner refactoring. call allocators to list hosts
>                     ExcludeList plannerAvoidOutput = new ExcludeList(avoids.getDataCentersToAvoid(),
>                             avoids.getPodsToAvoid(), avoids.getClustersToAvoid(), avoids.getHostsToAvoid(),
>                             avoids.getPoolsToAvoid());
> 
>                     resetAvoidSet(plannerAvoidOutput, plannerAvoidInput);
> 
>                     dest = checkClustersforDestination(clusterList, vmProfile, plan, avoids, dc,
>                             getPlannerUsage(planner, vmProfile, plan, avoids), plannerAvoidOutput);
>                     if (dest != null) {
>                         return dest;
>                     }
>                     // reset the avoid input to the planners
>                     resetAvoidSet(avoids, plannerAvoidOutput);
> 
>                 } else {
>                     return null;
>                 }
>             } else {
> ............
> ............
>             }
>         }
> 
> 
> 
> 
> 
> 
> 
> 
> 


RE: Possible bug in DeploymentPlanner?

Posted by Prachi Damle <Pr...@citrix.com>.
Koushik,

I don't think you should add this check before calling allocators. It is the allocators output that should tell that local storage pools were not found. 
For your case where you need to put cluster in avoid set when no local pool is present:
a) Only local storage allocator should get invoked, all other allocators need to have a check at the start saying they can process only 'shared' storage request - This will make sure you do not get any pools from any other allocator for your local storage request.
b) canAvoidCluster() need to be changed to compare only 'local' or 'shared' pool set to the avoid output of the allocators to see if all pools are in avoid set - Or if there are no pools then  directly avoid the cluster.

Prachi

-----Original Message-----
From: Koushik Das [mailto:koushik.das@citrix.com] 
Sent: Thursday, October 17, 2013 9:23 AM
To: <de...@cloudstack.apache.org>
Subject: Re: Possible bug in DeploymentPlanner?

Prachi,

I am planning to add the following code in findSuitablePoolsForVolumes() method. If the VM has a local disk and the cluster doesn't have a local pool then the cluster will be added to avoid set. Let me know if you see any issues with this.

            if (useLocalStorage) {
                List<StoragePoolVO> allPoolsInCluster = _storagePoolDao.findPoolsByTags(plan.getDataCenterId(), plan.getPodId(), plan.getClusterId(), null);
                boolean localStoragePoolExists = false;
                for (StoragePoolVO pool : allPoolsInCluster) {
                    if (pool.isLocal()) {
                        localStoragePoolExists = true;
                        break;
                    }
                }
                if (!localStoragePoolExists) {
                    avoid.addCluster(plan.getClusterId());
                }
            }

-Koushik

On 17-Oct-2013, at 3:21 AM, Prachi Damle <Pr...@citrix.com>> wrote:



-----Original Message-----
From: Koushik Das [mailto:koushik.das@citrix.com<http://citrix.com>]
Sent: Wednesday, October 16, 2013 5:21 AM
To: <de...@cloudstack.apache.org>>
Subject: Re: Possible bug in DeploymentPlanner?


On 16-Oct-2013, at 3:12 AM, Prachi Damle <Pr...@citrix.com>> wrote:


-----Original Message-----
From: Koushik Das [mailto:koushik.das@citrix.com<http://citrix.com>]
Sent: Tuesday, October 15, 2013 11:43 AM
To: <de...@cloudstack.apache.org>>
Subject: Re: Possible bug in DeploymentPlanner?


Thanks for the explanation Prachi.

Here is one issue I see with what you explained below. If an allocator is not suppose to handle a particular scenario how should it deal with the avoid set. Should it put all available pools in the avoid set. In that case subsequent allocators won't get a chance to do anything as all pools are in avoid set. If more than one allocator can operate on a particular pool then the situation becomes tricky.

To give an example, if there is a request for volume on shared storage and first allocator to get called is the local allocator. Since it is not going to handle this case it will return empty/null list. Now should it put all the other shared pools in avoid set or only the local pools in avoid set? Similarly if the request is for a local volume and the request first goes to zone or cluster-wide storage allocator what should be there in the avoid set?


[Prachi] I see the bug you are thinking about - deploymentPlanningManager :: canAvoidCluster  is only checking the shared storages and thus does not put the cluster in avoid set in case the local storage is not a match for the host.

I think the storage allocators should put the resources not considered, in avoid set, that are within their scope. Also they should do so only when they are supposed to handle the request. Thus local allocator sees only local pools, so it should place them into avoid set if not considered and only when this allocator is supposed to handle the scenario. While the zone-wide/cluster-wide should set those within their scope.
Also, the allocators have a defined order in which they get invoked and we need to maintain the order defined in the componentContext.xml

Now to solve the bug, the deploymentPlanningManager :: canAvoidCluster  needs to consider which scenario applies for the given vmprofile - i.e whether its local storage or shared to correctly figure out if cluster can be avoided by considering applicable pools only.
Does this work fine for your scenario?


[Koushik] Not sure if canAvoidCluster() is the right place. Shared and local pools can belong to the same cluster, so the cluster cannot be put in avoid set. Only the pools needs to be selectively added to avoid set.

[Prachi] Allocators add  the resources to avoid set and then DPM can figure out if a cluster is done searching looking at the avoid set, so that the planner need not choose that cluster on next try.
So two changes are needed: Storage Allocators should set resources in avoid set within their scope and DPM should  check if a cluster is done accordingly.
Can you let me know the specific bug you want to change allocators for? I will add the above change in a separate ticket if already not done for your bug.


On 15-Oct-2013, at 11:31 PM, Prachi Damle <Pr...@citrix.com>>
wrote:

Koushik,

The deployment planning process is divided between DeploymentPlanners and Allocators. The planners are supposed to feed a list of clusters the allocators should look into to find a potential destination. While the allocators should provide a avoid set back to the planners consisting of any resource not considered.

Reasoning how the planners work:
Planners can set/reset clusters into avoid set depending on the heuristics they implement and the order in which they want the clusters to be traversed.
When all options of the planner are exhausted they should return an empty cluster list halting the search.

What should the allocators do?
Iterate over the given cluster list and in each cluster find out suitable resources - all other resources not considered AND not suitable MUST be added to the avoid set so that the planners get the correct avoid input.
This is necessary for the logic to not enter an infinite loop.

As you see, only planners can halt the search process by referring to the avoid set provided by the allocators. If you see any case not following this, that needs to be fixed.

Also, I do think in general it will be safe to add a configurable retry limit on this loop to have control in case any new logic in allocators don't follow this reasoning. I will add that limit.

Thanks,
Prachi



-----Original Message-----
From: Koushik Das [mailto:koushik.das@citrix.com<http://citrix.com>]
Sent: Tuesday, October 15, 2013 9:19 AM
To: <de...@cloudstack.apache.org>>
Subject: Possible bug in DeploymentPlanner?

I was making some changes in the storage pool allocators related to some bug fix and came across this code snippet in planDeplyment() method of DeploymentPlanningManagerImpl.java.
In this if the checkClustersforDestination() returns null and the 'avoids' parameter is not correctly updated (one such place can be the storage allocators) then the while loop will never terminate. Is there any  assumption about how the 'avoids' parameter needs to be updated? From the code it is not very intuitive. I saw some places in the storage pool allocators where this will not get updated.

Wanted to understand the reason for doing it this way? Can the while (true) condition be replaced with something more intuitive?

         while (true) {
             if (planner instanceof DeploymentClusterPlanner) {
                 ExcludeList plannerAvoidInput = new ExcludeList(avoids.getDataCentersToAvoid(),
                         avoids.getPodsToAvoid(), avoids.getClustersToAvoid(), avoids.getHostsToAvoid(),
                         avoids.getPoolsToAvoid());

                 clusterList = ((DeploymentClusterPlanner) planner).orderClusters(vmProfile, plan, avoids);
                 if (clusterList != null && !clusterList.isEmpty()) {
                     // planner refactoring. call allocators to list hosts
                     ExcludeList plannerAvoidOutput = new ExcludeList(avoids.getDataCentersToAvoid(),
                             avoids.getPodsToAvoid(), avoids.getClustersToAvoid(), avoids.getHostsToAvoid(),
                             avoids.getPoolsToAvoid());

                     resetAvoidSet(plannerAvoidOutput, plannerAvoidInput);

                     dest = checkClustersforDestination(clusterList, vmProfile, plan, avoids, dc,
                             getPlannerUsage(planner, vmProfile, plan, avoids), plannerAvoidOutput);
                     if (dest != null) {
                         return dest;
                     }
                     // reset the avoid input to the planners
                     resetAvoidSet(avoids, plannerAvoidOutput);

                 } else {
                     return null;
                 }
             } else {
............
............
             }
         }










Re: Possible bug in DeploymentPlanner?

Posted by Koushik Das <ko...@citrix.com>.
Prachi,

I am planning to add the following code in findSuitablePoolsForVolumes() method. If the VM has a local disk and the cluster doesn't have a local pool then the cluster will be added to avoid set. Let me know if you see any issues with this.

            if (useLocalStorage) {
                List<StoragePoolVO> allPoolsInCluster = _storagePoolDao.findPoolsByTags(plan.getDataCenterId(), plan.getPodId(), plan.getClusterId(), null);
                boolean localStoragePoolExists = false;
                for (StoragePoolVO pool : allPoolsInCluster) {
                    if (pool.isLocal()) {
                        localStoragePoolExists = true;
                        break;
                    }
                }
                if (!localStoragePoolExists) {
                    avoid.addCluster(plan.getClusterId());
                }
            }

-Koushik

On 17-Oct-2013, at 3:21 AM, Prachi Damle <Pr...@citrix.com>> wrote:



-----Original Message-----
From: Koushik Das [mailto:koushik.das@citrix.com<http://citrix.com>]
Sent: Wednesday, October 16, 2013 5:21 AM
To: <de...@cloudstack.apache.org>>
Subject: Re: Possible bug in DeploymentPlanner?


On 16-Oct-2013, at 3:12 AM, Prachi Damle <Pr...@citrix.com>> wrote:


-----Original Message-----
From: Koushik Das [mailto:koushik.das@citrix.com<http://citrix.com>]
Sent: Tuesday, October 15, 2013 11:43 AM
To: <de...@cloudstack.apache.org>>
Subject: Re: Possible bug in DeploymentPlanner?


Thanks for the explanation Prachi.

Here is one issue I see with what you explained below. If an allocator is not suppose to handle a particular scenario how should it deal with the avoid set. Should it put all available pools in the avoid set. In that case subsequent allocators won't get a chance to do anything as all pools are in avoid set. If more than one allocator can operate on a particular pool then the situation becomes tricky.

To give an example, if there is a request for volume on shared storage and first allocator to get called is the local allocator. Since it is not going to handle this case it will return empty/null list. Now should it put all the other shared pools in avoid set or only the local pools in avoid set? Similarly if the request is for a local volume and the request first goes to zone or cluster-wide storage allocator what should be there in the avoid set?


[Prachi] I see the bug you are thinking about - deploymentPlanningManager :: canAvoidCluster  is only checking the shared storages and thus does not put the cluster in avoid set in case the local storage is not a match for the host.

I think the storage allocators should put the resources not considered, in avoid set, that are within their scope. Also they should do so only when they are supposed to handle the request. Thus local allocator sees only local pools, so it should place them into avoid set if not considered and only when this allocator is supposed to handle the scenario. While the zone-wide/cluster-wide should set those within their scope.
Also, the allocators have a defined order in which they get invoked and we need to maintain the order defined in the componentContext.xml

Now to solve the bug, the deploymentPlanningManager :: canAvoidCluster  needs to consider which scenario applies for the given vmprofile - i.e whether its local storage or shared to correctly figure out if cluster can be avoided by considering applicable pools only.
Does this work fine for your scenario?


[Koushik] Not sure if canAvoidCluster() is the right place. Shared and local pools can belong to the same cluster, so the cluster cannot be put in avoid set. Only the pools needs to be selectively added to avoid set.

[Prachi] Allocators add  the resources to avoid set and then DPM can figure out if a cluster is done searching looking at the avoid set, so that the planner need not choose that cluster on next try.
So two changes are needed: Storage Allocators should set resources in avoid set within their scope and DPM should  check if a cluster is done accordingly.
Can you let me know the specific bug you want to change allocators for? I will add the above change in a separate ticket if already not done for your bug.


On 15-Oct-2013, at 11:31 PM, Prachi Damle <Pr...@citrix.com>>
wrote:

Koushik,

The deployment planning process is divided between DeploymentPlanners and Allocators. The planners are supposed to feed a list of clusters the allocators should look into to find a potential destination. While the allocators should provide a avoid set back to the planners consisting of any resource not considered.

Reasoning how the planners work:
Planners can set/reset clusters into avoid set depending on the heuristics they implement and the order in which they want the clusters to be traversed.
When all options of the planner are exhausted they should return an empty cluster list halting the search.

What should the allocators do?
Iterate over the given cluster list and in each cluster find out suitable resources - all other resources not considered AND not suitable MUST be added to the avoid set so that the planners get the correct avoid input.
This is necessary for the logic to not enter an infinite loop.

As you see, only planners can halt the search process by referring to the avoid set provided by the allocators. If you see any case not following this, that needs to be fixed.

Also, I do think in general it will be safe to add a configurable retry limit on this loop to have control in case any new logic in allocators don't follow this reasoning. I will add that limit.

Thanks,
Prachi



-----Original Message-----
From: Koushik Das [mailto:koushik.das@citrix.com<http://citrix.com>]
Sent: Tuesday, October 15, 2013 9:19 AM
To: <de...@cloudstack.apache.org>>
Subject: Possible bug in DeploymentPlanner?

I was making some changes in the storage pool allocators related to some bug fix and came across this code snippet in planDeplyment() method of DeploymentPlanningManagerImpl.java.
In this if the checkClustersforDestination() returns null and the 'avoids' parameter is not correctly updated (one such place can be the storage allocators) then the while loop will never terminate. Is there any  assumption about how the 'avoids' parameter needs to be updated? From the code it is not very intuitive. I saw some places in the storage pool allocators where this will not get updated.

Wanted to understand the reason for doing it this way? Can the while (true) condition be replaced with something more intuitive?

         while (true) {
             if (planner instanceof DeploymentClusterPlanner) {
                 ExcludeList plannerAvoidInput = new ExcludeList(avoids.getDataCentersToAvoid(),
                         avoids.getPodsToAvoid(), avoids.getClustersToAvoid(), avoids.getHostsToAvoid(),
                         avoids.getPoolsToAvoid());

                 clusterList = ((DeploymentClusterPlanner) planner).orderClusters(vmProfile, plan, avoids);
                 if (clusterList != null && !clusterList.isEmpty()) {
                     // planner refactoring. call allocators to list hosts
                     ExcludeList plannerAvoidOutput = new ExcludeList(avoids.getDataCentersToAvoid(),
                             avoids.getPodsToAvoid(), avoids.getClustersToAvoid(), avoids.getHostsToAvoid(),
                             avoids.getPoolsToAvoid());

                     resetAvoidSet(plannerAvoidOutput, plannerAvoidInput);

                     dest = checkClustersforDestination(clusterList, vmProfile, plan, avoids, dc,
                             getPlannerUsage(planner, vmProfile, plan, avoids), plannerAvoidOutput);
                     if (dest != null) {
                         return dest;
                     }
                     // reset the avoid input to the planners
                     resetAvoidSet(avoids, plannerAvoidOutput);

                 } else {
                     return null;
                 }
             } else {
............
............
             }
         }










RE: Possible bug in DeploymentPlanner?

Posted by Prachi Damle <Pr...@citrix.com>.

-----Original Message-----
From: Koushik Das [mailto:koushik.das@citrix.com] 
Sent: Wednesday, October 16, 2013 5:21 AM
To: <de...@cloudstack.apache.org>
Subject: Re: Possible bug in DeploymentPlanner?


On 16-Oct-2013, at 3:12 AM, Prachi Damle <Pr...@citrix.com> wrote:

> 
> -----Original Message-----
> From: Koushik Das [mailto:koushik.das@citrix.com] 
> Sent: Tuesday, October 15, 2013 11:43 AM
> To: <de...@cloudstack.apache.org>
> Subject: Re: Possible bug in DeploymentPlanner?
> 
> 
> Thanks for the explanation Prachi.
> 
> Here is one issue I see with what you explained below. If an allocator is not suppose to handle a particular scenario how should it deal with the avoid set. Should it put all available pools in the avoid set. In that case subsequent allocators won't get a chance to do anything as all pools are in avoid set. If more than one allocator can operate on a particular pool then the situation becomes tricky.
> 
> To give an example, if there is a request for volume on shared storage and first allocator to get called is the local allocator. Since it is not going to handle this case it will return empty/null list. Now should it put all the other shared pools in avoid set or only the local pools in avoid set? Similarly if the request is for a local volume and the request first goes to zone or cluster-wide storage allocator what should be there in the avoid set?
> 
> 
> [Prachi] I see the bug you are thinking about - deploymentPlanningManager :: canAvoidCluster  is only checking the shared storages and thus does not put the cluster in avoid set in case the local storage is not a match for the host.
> 
> I think the storage allocators should put the resources not considered, in avoid set, that are within their scope. Also they should do so only when they are supposed to handle the request. Thus local allocator sees only local pools, so it should place them into avoid set if not considered and only when this allocator is supposed to handle the scenario. While the zone-wide/cluster-wide should set those within their scope.
> Also, the allocators have a defined order in which they get invoked and we need to maintain the order defined in the componentContext.xml
> 
> Now to solve the bug, the deploymentPlanningManager :: canAvoidCluster  needs to consider which scenario applies for the given vmprofile - i.e whether its local storage or shared to correctly figure out if cluster can be avoided by considering applicable pools only.
> Does this work fine for your scenario?
> 
> 
[Koushik] Not sure if canAvoidCluster() is the right place. Shared and local pools can belong to the same cluster, so the cluster cannot be put in avoid set. Only the pools needs to be selectively added to avoid set.

[Prachi] Allocators add  the resources to avoid set and then DPM can figure out if a cluster is done searching looking at the avoid set, so that the planner need not choose that cluster on next try.
So two changes are needed: Storage Allocators should set resources in avoid set within their scope and DPM should  check if a cluster is done accordingly. 
Can you let me know the specific bug you want to change allocators for? I will add the above change in a separate ticket if already not done for your bug.

> 
> On 15-Oct-2013, at 11:31 PM, Prachi Damle <Pr...@citrix.com>
> wrote:
> 
>> Koushik,
>> 
>> The deployment planning process is divided between DeploymentPlanners and Allocators. The planners are supposed to feed a list of clusters the allocators should look into to find a potential destination. While the allocators should provide a avoid set back to the planners consisting of any resource not considered.
>> 
>> Reasoning how the planners work:
>> Planners can set/reset clusters into avoid set depending on the heuristics they implement and the order in which they want the clusters to be traversed.
>> When all options of the planner are exhausted they should return an empty cluster list halting the search.
>> 
>> What should the allocators do?
>> Iterate over the given cluster list and in each cluster find out suitable resources - all other resources not considered AND not suitable MUST be added to the avoid set so that the planners get the correct avoid input.
>> This is necessary for the logic to not enter an infinite loop.
>> 
>> As you see, only planners can halt the search process by referring to the avoid set provided by the allocators. If you see any case not following this, that needs to be fixed. 
>> 
>> Also, I do think in general it will be safe to add a configurable retry limit on this loop to have control in case any new logic in allocators don't follow this reasoning. I will add that limit.
>> 
>> Thanks,
>> Prachi
>> 
>> 
>> 
>> -----Original Message-----
>> From: Koushik Das [mailto:koushik.das@citrix.com] 
>> Sent: Tuesday, October 15, 2013 9:19 AM
>> To: <de...@cloudstack.apache.org>
>> Subject: Possible bug in DeploymentPlanner?
>> 
>> I was making some changes in the storage pool allocators related to some bug fix and came across this code snippet in planDeplyment() method of DeploymentPlanningManagerImpl.java.
>> In this if the checkClustersforDestination() returns null and the 'avoids' parameter is not correctly updated (one such place can be the storage allocators) then the while loop will never terminate. Is there any  assumption about how the 'avoids' parameter needs to be updated? From the code it is not very intuitive. I saw some places in the storage pool allocators where this will not get updated.
>> 
>> Wanted to understand the reason for doing it this way? Can the while (true) condition be replaced with something more intuitive?
>> 
>>           while (true) {
>>               if (planner instanceof DeploymentClusterPlanner) {
>>                   ExcludeList plannerAvoidInput = new ExcludeList(avoids.getDataCentersToAvoid(),
>>                           avoids.getPodsToAvoid(), avoids.getClustersToAvoid(), avoids.getHostsToAvoid(),
>>                           avoids.getPoolsToAvoid());
>> 
>>                   clusterList = ((DeploymentClusterPlanner) planner).orderClusters(vmProfile, plan, avoids);
>>                   if (clusterList != null && !clusterList.isEmpty()) {
>>                       // planner refactoring. call allocators to list hosts
>>                       ExcludeList plannerAvoidOutput = new ExcludeList(avoids.getDataCentersToAvoid(),
>>                               avoids.getPodsToAvoid(), avoids.getClustersToAvoid(), avoids.getHostsToAvoid(),
>>                               avoids.getPoolsToAvoid());
>> 
>>                       resetAvoidSet(plannerAvoidOutput, plannerAvoidInput);
>> 
>>                       dest = checkClustersforDestination(clusterList, vmProfile, plan, avoids, dc,
>>                               getPlannerUsage(planner, vmProfile, plan, avoids), plannerAvoidOutput);
>>                       if (dest != null) {
>>                           return dest;
>>                       }
>>                       // reset the avoid input to the planners
>>                       resetAvoidSet(avoids, plannerAvoidOutput);
>> 
>>                   } else {
>>                       return null;
>>                   }
>>               } else {
>> ............
>> ............
>>               }
>>           }
>> 
>> 
>> 
>> 
>> 
>> 
> 


Re: Possible bug in DeploymentPlanner?

Posted by Koushik Das <ko...@citrix.com>.
On 16-Oct-2013, at 3:12 AM, Prachi Damle <Pr...@citrix.com> wrote:

> 
> -----Original Message-----
> From: Koushik Das [mailto:koushik.das@citrix.com] 
> Sent: Tuesday, October 15, 2013 11:43 AM
> To: <de...@cloudstack.apache.org>
> Subject: Re: Possible bug in DeploymentPlanner?
> 
> 
> Thanks for the explanation Prachi.
> 
> Here is one issue I see with what you explained below. If an allocator is not suppose to handle a particular scenario how should it deal with the avoid set. Should it put all available pools in the avoid set. In that case subsequent allocators won't get a chance to do anything as all pools are in avoid set. If more than one allocator can operate on a particular pool then the situation becomes tricky.
> 
> To give an example, if there is a request for volume on shared storage and first allocator to get called is the local allocator. Since it is not going to handle this case it will return empty/null list. Now should it put all the other shared pools in avoid set or only the local pools in avoid set? Similarly if the request is for a local volume and the request first goes to zone or cluster-wide storage allocator what should be there in the avoid set?
> 
> 
> [Prachi] I see the bug you are thinking about - deploymentPlanningManager :: canAvoidCluster  is only checking the shared storages and thus does not put the cluster in avoid set in case the local storage is not a match for the host.
> 
> I think the storage allocators should put the resources not considered, in avoid set, that are within their scope. Also they should do so only when they are supposed to handle the request. Thus local allocator sees only local pools, so it should place them into avoid set if not considered and only when this allocator is supposed to handle the scenario. While the zone-wide/cluster-wide should set those within their scope.
> Also, the allocators have a defined order in which they get invoked and we need to maintain the order defined in the componentContext.xml
> 
> Now to solve the bug, the deploymentPlanningManager :: canAvoidCluster  needs to consider which scenario applies for the given vmprofile - i.e whether its local storage or shared to correctly figure out if cluster can be avoided by considering applicable pools only.
> Does this work fine for your scenario?
> 
> 
[Koushik] Not sure if canAvoidCluster() is the right place. Shared and local pools can belong to the same cluster, so the cluster cannot be put in avoid set. Only the pools needs to be selectively added to avoid set.

> 
> On 15-Oct-2013, at 11:31 PM, Prachi Damle <Pr...@citrix.com>
> wrote:
> 
>> Koushik,
>> 
>> The deployment planning process is divided between DeploymentPlanners and Allocators. The planners are supposed to feed a list of clusters the allocators should look into to find a potential destination. While the allocators should provide a avoid set back to the planners consisting of any resource not considered.
>> 
>> Reasoning how the planners work:
>> Planners can set/reset clusters into avoid set depending on the heuristics they implement and the order in which they want the clusters to be traversed.
>> When all options of the planner are exhausted they should return an empty cluster list halting the search.
>> 
>> What should the allocators do?
>> Iterate over the given cluster list and in each cluster find out suitable resources - all other resources not considered AND not suitable MUST be added to the avoid set so that the planners get the correct avoid input.
>> This is necessary for the logic to not enter an infinite loop.
>> 
>> As you see, only planners can halt the search process by referring to the avoid set provided by the allocators. If you see any case not following this, that needs to be fixed. 
>> 
>> Also, I do think in general it will be safe to add a configurable retry limit on this loop to have control in case any new logic in allocators don't follow this reasoning. I will add that limit.
>> 
>> Thanks,
>> Prachi
>> 
>> 
>> 
>> -----Original Message-----
>> From: Koushik Das [mailto:koushik.das@citrix.com] 
>> Sent: Tuesday, October 15, 2013 9:19 AM
>> To: <de...@cloudstack.apache.org>
>> Subject: Possible bug in DeploymentPlanner?
>> 
>> I was making some changes in the storage pool allocators related to some bug fix and came across this code snippet in planDeplyment() method of DeploymentPlanningManagerImpl.java.
>> In this if the checkClustersforDestination() returns null and the 'avoids' parameter is not correctly updated (one such place can be the storage allocators) then the while loop will never terminate. Is there any  assumption about how the 'avoids' parameter needs to be updated? From the code it is not very intuitive. I saw some places in the storage pool allocators where this will not get updated.
>> 
>> Wanted to understand the reason for doing it this way? Can the while (true) condition be replaced with something more intuitive?
>> 
>>           while (true) {
>>               if (planner instanceof DeploymentClusterPlanner) {
>>                   ExcludeList plannerAvoidInput = new ExcludeList(avoids.getDataCentersToAvoid(),
>>                           avoids.getPodsToAvoid(), avoids.getClustersToAvoid(), avoids.getHostsToAvoid(),
>>                           avoids.getPoolsToAvoid());
>> 
>>                   clusterList = ((DeploymentClusterPlanner) planner).orderClusters(vmProfile, plan, avoids);
>>                   if (clusterList != null && !clusterList.isEmpty()) {
>>                       // planner refactoring. call allocators to list hosts
>>                       ExcludeList plannerAvoidOutput = new ExcludeList(avoids.getDataCentersToAvoid(),
>>                               avoids.getPodsToAvoid(), avoids.getClustersToAvoid(), avoids.getHostsToAvoid(),
>>                               avoids.getPoolsToAvoid());
>> 
>>                       resetAvoidSet(plannerAvoidOutput, plannerAvoidInput);
>> 
>>                       dest = checkClustersforDestination(clusterList, vmProfile, plan, avoids, dc,
>>                               getPlannerUsage(planner, vmProfile, plan, avoids), plannerAvoidOutput);
>>                       if (dest != null) {
>>                           return dest;
>>                       }
>>                       // reset the avoid input to the planners
>>                       resetAvoidSet(avoids, plannerAvoidOutput);
>> 
>>                   } else {
>>                       return null;
>>                   }
>>               } else {
>> ............
>> ............
>>               }
>>           }
>> 
>> 
>> 
>> 
>> 
>> 
> 


RE: Possible bug in DeploymentPlanner?

Posted by Prachi Damle <Pr...@citrix.com>.
-----Original Message-----
From: Koushik Das [mailto:koushik.das@citrix.com] 
Sent: Tuesday, October 15, 2013 11:43 AM
To: <de...@cloudstack.apache.org>
Subject: Re: Possible bug in DeploymentPlanner?


Thanks for the explanation Prachi.

Here is one issue I see with what you explained below. If an allocator is not suppose to handle a particular scenario how should it deal with the avoid set. Should it put all available pools in the avoid set. In that case subsequent allocators won't get a chance to do anything as all pools are in avoid set. If more than one allocator can operate on a particular pool then the situation becomes tricky.

To give an example, if there is a request for volume on shared storage and first allocator to get called is the local allocator. Since it is not going to handle this case it will return empty/null list. Now should it put all the other shared pools in avoid set or only the local pools in avoid set? Similarly if the request is for a local volume and the request first goes to zone or cluster-wide storage allocator what should be there in the avoid set?


[Prachi] I see the bug you are thinking about - deploymentPlanningManager :: canAvoidCluster  is only checking the shared storages and thus does not put the cluster in avoid set in case the local storage is not a match for the host.

I think the storage allocators should put the resources not considered, in avoid set, that are within their scope. Also they should do so only when they are supposed to handle the request. Thus local allocator sees only local pools, so it should place them into avoid set if not considered and only when this allocator is supposed to handle the scenario. While the zone-wide/cluster-wide should set those within their scope.
Also, the allocators have a defined order in which they get invoked and we need to maintain the order defined in the componentContext.xml

Now to solve the bug, the deploymentPlanningManager :: canAvoidCluster  needs to consider which scenario applies for the given vmprofile - i.e whether its local storage or shared to correctly figure out if cluster can be avoided by considering applicable pools only.
Does this work fine for your scenario?



On 15-Oct-2013, at 11:31 PM, Prachi Damle <Pr...@citrix.com>
 wrote:

> Koushik,
> 
> The deployment planning process is divided between DeploymentPlanners and Allocators. The planners are supposed to feed a list of clusters the allocators should look into to find a potential destination. While the allocators should provide a avoid set back to the planners consisting of any resource not considered.
> 
> Reasoning how the planners work:
> Planners can set/reset clusters into avoid set depending on the heuristics they implement and the order in which they want the clusters to be traversed.
> When all options of the planner are exhausted they should return an empty cluster list halting the search.
> 
> What should the allocators do?
> Iterate over the given cluster list and in each cluster find out suitable resources - all other resources not considered AND not suitable MUST be added to the avoid set so that the planners get the correct avoid input.
> This is necessary for the logic to not enter an infinite loop.
> 
> As you see, only planners can halt the search process by referring to the avoid set provided by the allocators. If you see any case not following this, that needs to be fixed. 
> 
> Also, I do think in general it will be safe to add a configurable retry limit on this loop to have control in case any new logic in allocators don't follow this reasoning. I will add that limit.
> 
> Thanks,
> Prachi
> 
> 
> 
> -----Original Message-----
> From: Koushik Das [mailto:koushik.das@citrix.com] 
> Sent: Tuesday, October 15, 2013 9:19 AM
> To: <de...@cloudstack.apache.org>
> Subject: Possible bug in DeploymentPlanner?
> 
> I was making some changes in the storage pool allocators related to some bug fix and came across this code snippet in planDeplyment() method of DeploymentPlanningManagerImpl.java.
> In this if the checkClustersforDestination() returns null and the 'avoids' parameter is not correctly updated (one such place can be the storage allocators) then the while loop will never terminate. Is there any  assumption about how the 'avoids' parameter needs to be updated? From the code it is not very intuitive. I saw some places in the storage pool allocators where this will not get updated.
> 
> Wanted to understand the reason for doing it this way? Can the while (true) condition be replaced with something more intuitive?
> 
>            while (true) {
>                if (planner instanceof DeploymentClusterPlanner) {
>                    ExcludeList plannerAvoidInput = new ExcludeList(avoids.getDataCentersToAvoid(),
>                            avoids.getPodsToAvoid(), avoids.getClustersToAvoid(), avoids.getHostsToAvoid(),
>                            avoids.getPoolsToAvoid());
> 
>                    clusterList = ((DeploymentClusterPlanner) planner).orderClusters(vmProfile, plan, avoids);
>                    if (clusterList != null && !clusterList.isEmpty()) {
>                        // planner refactoring. call allocators to list hosts
>                        ExcludeList plannerAvoidOutput = new ExcludeList(avoids.getDataCentersToAvoid(),
>                                avoids.getPodsToAvoid(), avoids.getClustersToAvoid(), avoids.getHostsToAvoid(),
>                                avoids.getPoolsToAvoid());
> 
>                        resetAvoidSet(plannerAvoidOutput, plannerAvoidInput);
> 
>                        dest = checkClustersforDestination(clusterList, vmProfile, plan, avoids, dc,
>                                getPlannerUsage(planner, vmProfile, plan, avoids), plannerAvoidOutput);
>                        if (dest != null) {
>                            return dest;
>                        }
>                        // reset the avoid input to the planners
>                        resetAvoidSet(avoids, plannerAvoidOutput);
> 
>                    } else {
>                        return null;
>                    }
>                } else {
> ............
> ............
>                }
>            }
> 
> 
> 
> 
> 
> 


Re: Possible bug in DeploymentPlanner?

Posted by Koushik Das <ko...@citrix.com>.
Thanks for the explanation Prachi.

Here is one issue I see with what you explained below. If an allocator is not suppose to handle a particular scenario how should it deal with the avoid set. Should it put all available pools in the avoid set. In that case subsequent allocators won't get a chance to do anything as all pools are in avoid set. If more than one allocator can operate on a particular pool then the situation becomes tricky.

To give an example, if there is a request for volume on shared storage and first allocator to get called is the local allocator. Since it is not going to handle this case it will return empty/null list. Now should it put all the other shared pools in avoid set or only the local pools in avoid set? Similarly if the request is for a local volume and the request first goes to zone or cluster-wide storage allocator what should be there in the avoid set?



On 15-Oct-2013, at 11:31 PM, Prachi Damle <Pr...@citrix.com>
 wrote:

> Koushik,
> 
> The deployment planning process is divided between DeploymentPlanners and Allocators. The planners are supposed to feed a list of clusters the allocators should look into to find a potential destination. While the allocators should provide a avoid set back to the planners consisting of any resource not considered.
> 
> Reasoning how the planners work:
> Planners can set/reset clusters into avoid set depending on the heuristics they implement and the order in which they want the clusters to be traversed.
> When all options of the planner are exhausted they should return an empty cluster list halting the search.
> 
> What should the allocators do?
> Iterate over the given cluster list and in each cluster find out suitable resources - all other resources not considered AND not suitable MUST be added to the avoid set so that the planners get the correct avoid input.
> This is necessary for the logic to not enter an infinite loop.
> 
> As you see, only planners can halt the search process by referring to the avoid set provided by the allocators. If you see any case not following this, that needs to be fixed. 
> 
> Also, I do think in general it will be safe to add a configurable retry limit on this loop to have control in case any new logic in allocators don't follow this reasoning. I will add that limit.
> 
> Thanks,
> Prachi
> 
> 
> 
> -----Original Message-----
> From: Koushik Das [mailto:koushik.das@citrix.com] 
> Sent: Tuesday, October 15, 2013 9:19 AM
> To: <de...@cloudstack.apache.org>
> Subject: Possible bug in DeploymentPlanner?
> 
> I was making some changes in the storage pool allocators related to some bug fix and came across this code snippet in planDeplyment() method of DeploymentPlanningManagerImpl.java.
> In this if the checkClustersforDestination() returns null and the 'avoids' parameter is not correctly updated (one such place can be the storage allocators) then the while loop will never terminate. Is there any  assumption about how the 'avoids' parameter needs to be updated? From the code it is not very intuitive. I saw some places in the storage pool allocators where this will not get updated.
> 
> Wanted to understand the reason for doing it this way? Can the while (true) condition be replaced with something more intuitive?
> 
>            while (true) {
>                if (planner instanceof DeploymentClusterPlanner) {
>                    ExcludeList plannerAvoidInput = new ExcludeList(avoids.getDataCentersToAvoid(),
>                            avoids.getPodsToAvoid(), avoids.getClustersToAvoid(), avoids.getHostsToAvoid(),
>                            avoids.getPoolsToAvoid());
> 
>                    clusterList = ((DeploymentClusterPlanner) planner).orderClusters(vmProfile, plan, avoids);
>                    if (clusterList != null && !clusterList.isEmpty()) {
>                        // planner refactoring. call allocators to list hosts
>                        ExcludeList plannerAvoidOutput = new ExcludeList(avoids.getDataCentersToAvoid(),
>                                avoids.getPodsToAvoid(), avoids.getClustersToAvoid(), avoids.getHostsToAvoid(),
>                                avoids.getPoolsToAvoid());
> 
>                        resetAvoidSet(plannerAvoidOutput, plannerAvoidInput);
> 
>                        dest = checkClustersforDestination(clusterList, vmProfile, plan, avoids, dc,
>                                getPlannerUsage(planner, vmProfile, plan, avoids), plannerAvoidOutput);
>                        if (dest != null) {
>                            return dest;
>                        }
>                        // reset the avoid input to the planners
>                        resetAvoidSet(avoids, plannerAvoidOutput);
> 
>                    } else {
>                        return null;
>                    }
>                } else {
> ............
> ............
>                }
>            }
> 
> 
> 
> 
> 
> 


RE: Possible bug in DeploymentPlanner?

Posted by Prachi Damle <Pr...@citrix.com>.
Koushik,

The deployment planning process is divided between DeploymentPlanners and Allocators. The planners are supposed to feed a list of clusters the allocators should look into to find a potential destination. While the allocators should provide a avoid set back to the planners consisting of any resource not considered.

Reasoning how the planners work:
Planners can set/reset clusters into avoid set depending on the heuristics they implement and the order in which they want the clusters to be traversed.
When all options of the planner are exhausted they should return an empty cluster list halting the search.

What should the allocators do?
Iterate over the given cluster list and in each cluster find out suitable resources - all other resources not considered AND not suitable MUST be added to the avoid set so that the planners get the correct avoid input.
This is necessary for the logic to not enter an infinite loop.

As you see, only planners can halt the search process by referring to the avoid set provided by the allocators. If you see any case not following this, that needs to be fixed. 

Also, I do think in general it will be safe to add a configurable retry limit on this loop to have control in case any new logic in allocators don't follow this reasoning. I will add that limit.

Thanks,
Prachi



-----Original Message-----
From: Koushik Das [mailto:koushik.das@citrix.com] 
Sent: Tuesday, October 15, 2013 9:19 AM
To: <de...@cloudstack.apache.org>
Subject: Possible bug in DeploymentPlanner?

I was making some changes in the storage pool allocators related to some bug fix and came across this code snippet in planDeplyment() method of DeploymentPlanningManagerImpl.java.
In this if the checkClustersforDestination() returns null and the 'avoids' parameter is not correctly updated (one such place can be the storage allocators) then the while loop will never terminate. Is there any  assumption about how the 'avoids' parameter needs to be updated? From the code it is not very intuitive. I saw some places in the storage pool allocators where this will not get updated.

Wanted to understand the reason for doing it this way? Can the while (true) condition be replaced with something more intuitive?

            while (true) {
                if (planner instanceof DeploymentClusterPlanner) {
                    ExcludeList plannerAvoidInput = new ExcludeList(avoids.getDataCentersToAvoid(),
                            avoids.getPodsToAvoid(), avoids.getClustersToAvoid(), avoids.getHostsToAvoid(),
                            avoids.getPoolsToAvoid());

                    clusterList = ((DeploymentClusterPlanner) planner).orderClusters(vmProfile, plan, avoids);
                    if (clusterList != null && !clusterList.isEmpty()) {
                        // planner refactoring. call allocators to list hosts
                        ExcludeList plannerAvoidOutput = new ExcludeList(avoids.getDataCentersToAvoid(),
                                avoids.getPodsToAvoid(), avoids.getClustersToAvoid(), avoids.getHostsToAvoid(),
                                avoids.getPoolsToAvoid());

                        resetAvoidSet(plannerAvoidOutput, plannerAvoidInput);

                        dest = checkClustersforDestination(clusterList, vmProfile, plan, avoids, dc,
                                getPlannerUsage(planner, vmProfile, plan, avoids), plannerAvoidOutput);
                        if (dest != null) {
                            return dest;
                        }
                        // reset the avoid input to the planners
                        resetAvoidSet(avoids, plannerAvoidOutput);

                    } else {
                        return null;
                    }
                } else {
............
............
                }
            }