You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cloudstack.apache.org by Suresh Sadhu <Su...@citrix.com> on 2012/08/13 13:46:09 UTC
RE: Things to consider for Upcomming releases
Including few more points ..
HI All,
As I heard , Upcoming releases has major architecture changes involved. It will be good if we consider the following items for better improvement.so that it will help QA/Support and customers. Also it will minimize support calls count.
Also please feel to add if I miss any data points or you feel you can add few more points for improvements to the below list... kindly correct me if my assumption/views are wrong.
-
Job in waiting state
*****************
--- we don't fix the time to job completion ..because we don't know how much time will it take to complete a particular job But due to this design any initials job went in loop/infinite then other jobs are queued and wait for first job to finish.
The only way to come out of this situation is ..manually update the field status in the DB.
Is there any alternate(better) way to overcome the above problem... please share your view and thoughts
MY though:
If we put job priority/ Job waiting period as configurable parameters and end user can set/update the priority based on his needs and also waiting period.so that even one job in waiting state based on priority other waiting job needs to trigger.
In Current design if one job is in waiting state.. end user can't stop the job.
So if we introduce configurable parameters so the job in waiting(hanged state ) can be come out after configured duration over /expired.
Issue no# http://bugs.cloud.com/show_bug.cgi?id=12061
Job fails/retry mechanism :
********************
If any job fails due to some exception we don't try after some time.
Like example:
[ It's not accurate example but gives some info]
In Vmware case: you can't take snapshot on root and data disk of vm at the same time. If you try to trigger the snapshot on both disk on same time.
First request will be succeeded and second request will failed with proper limitation message.
Again end user has to initiate the snapshot on another disk(i. datadisk)
My Thought:
It will be good if we keep the failed job in queue and once first job completes ..Job manager should take/consider waiting job(failed job) in queue and process it.
Issue no# http://bugs.cloud.com/show_bug.cgi?id=11531
Please feel free to add few more data points here.
Usability in terms of UI refresh:
************************
CS has still has caching issue until and unless you manually click on refresh button. Sometimes you still see the cached values.
Issue no#http://bugs.cloudstack.org/browse/CS-14988
Error &Exception Handling & coordination between the tasks on same resource.
***************************************************************
I don't have much data points .if anybody has please share your views.
But will give one example:
Problem:
Power on stopped VM and at the same time perform snapshot on root disk- Fail(deploy VM failed with lock problem-Java.lang.exception occurred but snapshot jib completed successfully and tried again startVM this time its deployed successfully.)please check the attached log and execution logs.
Limitation:
This is not a problem under current architecture. We currently don't coordinate tasks but to throw runtime errors, when a snapshot task is being taken, VM operation may be temporarily unavailable to user and user needs to retry
Also for HA CloudStack HA/VMSync behavior is going to be same(implementation) for all hypervisor or still the functionality is same(no change in existing functionality) in upcoming release also.
Regards
Sadhu
Re: Things to consider for Upcomming releases
Posted by Kelven Yang <ke...@citrix.com>.
Job heartbeat(progress report etc), job expiration, job cancellation, and
job throttling will be improved in the new architecture
Kelven
On 8/13/12 4:46 AM, "Suresh Sadhu" <Su...@citrix.com> wrote:
>
>Including few more points ..
>
>HI All,
>
>As I heard , Upcoming releases has major architecture changes involved.
>It will be good if we consider the following items for better
>improvement.so that it will help QA/Support and customers. Also it will
>minimize support calls count.
>
>Also please feel to add if I miss any data points or you feel you can add
>few more points for improvements to the below list... kindly correct me
>if my assumption/views are wrong.
>
>-
>
>Job in waiting state
>*****************
>--- we don't fix the time to job completion ..because we don't know how
>much time will it take to complete a particular job But due to this
>design any initials job went in loop/infinite then other jobs are queued
>and wait for first job to finish.
>
>The only way to come out of this situation is ..manually update the field
>status in the DB.
>
>Is there any alternate(better) way to overcome the above problem...
>please share your view and thoughts
>
>MY though:
>If we put job priority/ Job waiting period as configurable parameters
>and end user can set/update the priority based on his needs and also
>waiting period.so that even one job in waiting state based on priority
>other waiting job needs to trigger.
>
>In Current design if one job is in waiting state.. end user can't stop
>the job.
>So if we introduce configurable parameters so the job in waiting(hanged
>state ) can be come out after configured duration over /expired.
>
>
>
>Issue no# http://bugs.cloud.com/show_bug.cgi?id=12061
>Job fails/retry mechanism :
>********************
>If any job fails due to some exception we don't try after some time.
>
>Like example:
>[ It's not accurate example but gives some info]
>
>In Vmware case: you can't take snapshot on root and data disk of vm at
>the same time. If you try to trigger the snapshot on both disk on same
>time.
>First request will be succeeded and second request will failed with
>proper limitation message.
>
>Again end user has to initiate the snapshot on another disk(i. datadisk)
>
>My Thought:
>It will be good if we keep the failed job in queue and once first job
>completes ..Job manager should take/consider waiting job(failed job) in
>queue and process it.
>
>Issue no# http://bugs.cloud.com/show_bug.cgi?id=11531
>
>
>Please feel free to add few more data points here.
>
>Usability in terms of UI refresh:
>************************
>CS has still has caching issue until and unless you manually click on
>refresh button. Sometimes you still see the cached values.
>
>
>Issue no#http://bugs.cloudstack.org/browse/CS-14988
>
>
>Error &Exception Handling & coordination between the tasks on same
>resource.
>***************************************************************
>I don't have much data points .if anybody has please share your views.
>
>But will give one example:
>
>Problem:
>
>Power on stopped VM and at the same time perform snapshot on root disk-
>Fail(deploy VM failed with lock problem-Java.lang.exception occurred but
>snapshot jib completed successfully and tried again startVM this time its
>deployed successfully.)please check the attached log and execution logs.
>
>Limitation:
>
>This is not a problem under current architecture. We currently don't
>coordinate tasks but to throw runtime errors, when a snapshot task is
>being taken, VM operation may be temporarily unavailable to user and user
>needs to retry
>
>
>Also for HA CloudStack HA/VMSync behavior is going to be
>same(implementation) for all hypervisor or still the functionality is
>same(no change in existing functionality) in upcoming release also.
>
>
>
>Regards
>
>Sadhu
>
>
>
>
>
>
>
>
Re: Things to consider for Upcomming releases
Posted by Kelven Yang <ke...@citrix.com>.
Job heartbeat(progress report etc), job expiration, job cancellation, and
job throttling will be improved in the new architecture
Kelven
On 8/13/12 4:46 AM, "Suresh Sadhu" <Su...@citrix.com> wrote:
>
>Including few more points ..
>
>HI All,
>
>As I heard , Upcoming releases has major architecture changes involved.
>It will be good if we consider the following items for better
>improvement.so that it will help QA/Support and customers. Also it will
>minimize support calls count.
>
>Also please feel to add if I miss any data points or you feel you can add
>few more points for improvements to the below list... kindly correct me
>if my assumption/views are wrong.
>
>-
>
>Job in waiting state
>*****************
>--- we don't fix the time to job completion ..because we don't know how
>much time will it take to complete a particular job But due to this
>design any initials job went in loop/infinite then other jobs are queued
>and wait for first job to finish.
>
>The only way to come out of this situation is ..manually update the field
>status in the DB.
>
>Is there any alternate(better) way to overcome the above problem...
>please share your view and thoughts
>
>MY though:
>If we put job priority/ Job waiting period as configurable parameters
>and end user can set/update the priority based on his needs and also
>waiting period.so that even one job in waiting state based on priority
>other waiting job needs to trigger.
>
>In Current design if one job is in waiting state.. end user can't stop
>the job.
>So if we introduce configurable parameters so the job in waiting(hanged
>state ) can be come out after configured duration over /expired.
>
>
>
>Issue no# http://bugs.cloud.com/show_bug.cgi?id=12061
>Job fails/retry mechanism :
>********************
>If any job fails due to some exception we don't try after some time.
>
>Like example:
>[ It's not accurate example but gives some info]
>
>In Vmware case: you can't take snapshot on root and data disk of vm at
>the same time. If you try to trigger the snapshot on both disk on same
>time.
>First request will be succeeded and second request will failed with
>proper limitation message.
>
>Again end user has to initiate the snapshot on another disk(i. datadisk)
>
>My Thought:
>It will be good if we keep the failed job in queue and once first job
>completes ..Job manager should take/consider waiting job(failed job) in
>queue and process it.
>
>Issue no# http://bugs.cloud.com/show_bug.cgi?id=11531
>
>
>Please feel free to add few more data points here.
>
>Usability in terms of UI refresh:
>************************
>CS has still has caching issue until and unless you manually click on
>refresh button. Sometimes you still see the cached values.
>
>
>Issue no#http://bugs.cloudstack.org/browse/CS-14988
>
>
>Error &Exception Handling & coordination between the tasks on same
>resource.
>***************************************************************
>I don't have much data points .if anybody has please share your views.
>
>But will give one example:
>
>Problem:
>
>Power on stopped VM and at the same time perform snapshot on root disk-
>Fail(deploy VM failed with lock problem-Java.lang.exception occurred but
>snapshot jib completed successfully and tried again startVM this time its
>deployed successfully.)please check the attached log and execution logs.
>
>Limitation:
>
>This is not a problem under current architecture. We currently don't
>coordinate tasks but to throw runtime errors, when a snapshot task is
>being taken, VM operation may be temporarily unavailable to user and user
>needs to retry
>
>
>Also for HA CloudStack HA/VMSync behavior is going to be
>same(implementation) for all hypervisor or still the functionality is
>same(no change in existing functionality) in upcoming release also.
>
>
>
>Regards
>
>Sadhu
>
>
>
>
>
>
>
>