You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@cloudstack.apache.org by Suresh Sadhu <Su...@citrix.com> on 2012/08/13 13:46:09 UTC

RE: Things to consider for Upcomming releases

Including few more points ..

HI All,

As I heard , Upcoming releases has major architecture changes involved. It will be good if we consider the following items for better improvement.so that it will help QA/Support and customers. Also it will  minimize  support calls count.

Also please feel to add if I miss any data points or you feel you can add few more points for improvements to the below list... kindly correct me if my assumption/views are wrong.

-

Job in waiting state
*****************
--- we don't fix the time to job completion ..because we don't know how much time  will it  take to complete  a particular job But due to this design any initials job went in loop/infinite then other jobs are queued and wait for first job to finish.

The only way to come out of this situation is ..manually update the field status in the DB.

Is there any alternate(better) way to overcome the above problem... please share your view and thoughts

MY though:
If we put job priority/ Job waiting period  as configurable parameters  and  end user can set/update the priority based on his needs and also waiting period.so that even one job in waiting state based on priority other waiting job needs to trigger.

In Current design if one job is in waiting state.. end user can't stop the job.
So if we introduce configurable parameters  so the job in waiting(hanged state ) can be come out after configured duration over /expired.



Issue no# http://bugs.cloud.com/show_bug.cgi?id=12061
Job fails/retry mechanism :
********************
If any job fails  due to some exception we don't try  after some time.

Like example:
[ It's not accurate example but gives some info]

In Vmware case: you can't take snapshot  on root and data disk of vm at the same time. If you try to trigger the snapshot on both disk on same time.
First request will be succeeded and second request will failed with proper limitation message.

Again end user has to initiate the snapshot on another disk(i. datadisk)

My Thought:
It will be good if we keep the failed job in queue and once first job completes ..Job manager should take/consider waiting job(failed job) in queue and process it.

Issue no# http://bugs.cloud.com/show_bug.cgi?id=11531


Please feel free to add few more data points here.

Usability in terms of UI refresh:
************************
CS has still has caching issue until and unless you manually click on refresh button. Sometimes you still see the cached values.


Issue no#http://bugs.cloudstack.org/browse/CS-14988


Error &Exception Handling & coordination between the tasks on same resource.
***************************************************************
I don't have much data points .if anybody has please share your views.

But will give one example:

Problem:

Power on stopped VM and at the same time perform snapshot on root disk- Fail(deploy VM failed with lock problem-Java.lang.exception occurred but snapshot jib completed successfully and tried again startVM this time its deployed successfully.)please check the attached log and execution logs.

Limitation:

This is not a problem under current architecture. We currently don't coordinate tasks but to throw runtime errors, when a snapshot task is being taken, VM operation may be temporarily unavailable to user and user needs to retry


Also  for HA  CloudStack HA/VMSync behavior is going to be same(implementation) for all hypervisor or still  the functionality is same(no change in existing functionality) in upcoming release also.



Regards

Sadhu

Re: Things to consider for Upcomming releases

Posted by Kelven Yang <ke...@citrix.com>.

Job heartbeat(progress report etc), job expiration, job cancellation, and
job throttling will be improved in the new architecture

Kelven

On 8/13/12 4:46 AM, "Suresh Sadhu" <Su...@citrix.com> wrote:

>
>Including few more points ..
>
>HI All,
>
>As I heard , Upcoming releases has major architecture changes involved.
>It will be good if we consider the following items for better
>improvement.so that it will help QA/Support and customers. Also it will
>minimize  support calls count.
>
>Also please feel to add if I miss any data points or you feel you can add
>few more points for improvements to the below list... kindly correct me
>if my assumption/views are wrong.
>
>-
>
>Job in waiting state
>*****************
>--- we don't fix the time to job completion ..because we don't know how
>much time  will it  take to complete  a particular job But due to this
>design any initials job went in loop/infinite then other jobs are queued
>and wait for first job to finish.
>
>The only way to come out of this situation is ..manually update the field
>status in the DB.
>
>Is there any alternate(better) way to overcome the above problem...
>please share your view and thoughts
>
>MY though:
>If we put job priority/ Job waiting period  as configurable parameters
>and  end user can set/update the priority based on his needs and also
>waiting period.so that even one job in waiting state based on priority
>other waiting job needs to trigger.
>
>In Current design if one job is in waiting state.. end user can't stop
>the job.
>So if we introduce configurable parameters  so the job in waiting(hanged
>state ) can be come out after configured duration over /expired.
>
>
>
>Issue no# http://bugs.cloud.com/show_bug.cgi?id=12061
>Job fails/retry mechanism :
>********************
>If any job fails  due to some exception we don't try  after some time.
>
>Like example:
>[ It's not accurate example but gives some info]
>
>In Vmware case: you can't take snapshot  on root and data disk of vm at
>the same time. If you try to trigger the snapshot on both disk on same
>time.
>First request will be succeeded and second request will failed with
>proper limitation message.
>
>Again end user has to initiate the snapshot on another disk(i. datadisk)
>
>My Thought:
>It will be good if we keep the failed job in queue and once first job
>completes ..Job manager should take/consider waiting job(failed job) in
>queue and process it.
>
>Issue no# http://bugs.cloud.com/show_bug.cgi?id=11531
>
>
>Please feel free to add few more data points here.
>
>Usability in terms of UI refresh:
>************************
>CS has still has caching issue until and unless you manually click on
>refresh button. Sometimes you still see the cached values.
>
>
>Issue no#http://bugs.cloudstack.org/browse/CS-14988
>
>
>Error &Exception Handling & coordination between the tasks on same
>resource.
>***************************************************************
>I don't have much data points .if anybody has please share your views.
>
>But will give one example:
>
>Problem:
>
>Power on stopped VM and at the same time perform snapshot on root disk-
>Fail(deploy VM failed with lock problem-Java.lang.exception occurred but
>snapshot jib completed successfully and tried again startVM this time its
>deployed successfully.)please check the attached log and execution logs.
>
>Limitation:
>
>This is not a problem under current architecture. We currently don't
>coordinate tasks but to throw runtime errors, when a snapshot task is
>being taken, VM operation may be temporarily unavailable to user and user
>needs to retry
>
>
>Also  for HA  CloudStack HA/VMSync behavior is going to be
>same(implementation) for all hypervisor or still  the functionality is
>same(no change in existing functionality) in upcoming release also.
>
>
>
>Regards
>
>Sadhu
>
>
>
>
>
>
>
>

Re: Things to consider for Upcomming releases

Posted by Kelven Yang <ke...@citrix.com>.

Job heartbeat(progress report etc), job expiration, job cancellation, and
job throttling will be improved in the new architecture

Kelven

On 8/13/12 4:46 AM, "Suresh Sadhu" <Su...@citrix.com> wrote:

>
>Including few more points ..
>
>HI All,
>
>As I heard , Upcoming releases has major architecture changes involved.
>It will be good if we consider the following items for better
>improvement.so that it will help QA/Support and customers. Also it will
>minimize  support calls count.
>
>Also please feel to add if I miss any data points or you feel you can add
>few more points for improvements to the below list... kindly correct me
>if my assumption/views are wrong.
>
>-
>
>Job in waiting state
>*****************
>--- we don't fix the time to job completion ..because we don't know how
>much time  will it  take to complete  a particular job But due to this
>design any initials job went in loop/infinite then other jobs are queued
>and wait for first job to finish.
>
>The only way to come out of this situation is ..manually update the field
>status in the DB.
>
>Is there any alternate(better) way to overcome the above problem...
>please share your view and thoughts
>
>MY though:
>If we put job priority/ Job waiting period  as configurable parameters
>and  end user can set/update the priority based on his needs and also
>waiting period.so that even one job in waiting state based on priority
>other waiting job needs to trigger.
>
>In Current design if one job is in waiting state.. end user can't stop
>the job.
>So if we introduce configurable parameters  so the job in waiting(hanged
>state ) can be come out after configured duration over /expired.
>
>
>
>Issue no# http://bugs.cloud.com/show_bug.cgi?id=12061
>Job fails/retry mechanism :
>********************
>If any job fails  due to some exception we don't try  after some time.
>
>Like example:
>[ It's not accurate example but gives some info]
>
>In Vmware case: you can't take snapshot  on root and data disk of vm at
>the same time. If you try to trigger the snapshot on both disk on same
>time.
>First request will be succeeded and second request will failed with
>proper limitation message.
>
>Again end user has to initiate the snapshot on another disk(i. datadisk)
>
>My Thought:
>It will be good if we keep the failed job in queue and once first job
>completes ..Job manager should take/consider waiting job(failed job) in
>queue and process it.
>
>Issue no# http://bugs.cloud.com/show_bug.cgi?id=11531
>
>
>Please feel free to add few more data points here.
>
>Usability in terms of UI refresh:
>************************
>CS has still has caching issue until and unless you manually click on
>refresh button. Sometimes you still see the cached values.
>
>
>Issue no#http://bugs.cloudstack.org/browse/CS-14988
>
>
>Error &Exception Handling & coordination between the tasks on same
>resource.
>***************************************************************
>I don't have much data points .if anybody has please share your views.
>
>But will give one example:
>
>Problem:
>
>Power on stopped VM and at the same time perform snapshot on root disk-
>Fail(deploy VM failed with lock problem-Java.lang.exception occurred but
>snapshot jib completed successfully and tried again startVM this time its
>deployed successfully.)please check the attached log and execution logs.
>
>Limitation:
>
>This is not a problem under current architecture. We currently don't
>coordinate tasks but to throw runtime errors, when a snapshot task is
>being taken, VM operation may be temporarily unavailable to user and user
>needs to retry
>
>
>Also  for HA  CloudStack HA/VMSync behavior is going to be
>same(implementation) for all hypervisor or still  the functionality is
>same(no change in existing functionality) in upcoming release also.
>
>
>
>Regards
>
>Sadhu
>
>
>
>
>
>
>
>