You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cloudstack.apache.org by Khosrow Moossavi <km...@cloudops.com> on 2018/01/09 17:25:59 UTC

Squeeze another PR (#2398) in 4.11 milestone

Hi community

We've found [1] and fixed [2] an issue in 4.10 regarding snapshots
remaining on primary storage (XenServer + Swift) which causes VDI chain
gets full after some time and user cannot take another snapshot.

Please include this in 4.11 milestone if you see fit.

[1]: https://issues.apache.org/jira/browse/CLOUDSTACK-10222
[2]: https://github.com/apache/cloudstack/pull/2398

Thanks
Khosrow

Re: Squeeze another PR (#2398) in 4.11 milestone

Posted by Rohit Yadav <ro...@shapeblue.com>.

Hi all,


I think the criteria for any additional PRs to be considered for 4.11.0.0 release should be:

- Is it a blocker, especially blocking the release?

- Is is fixing any test failures or regressions?

- Is it release related, for example - packaging, systemvmtemplate, db-upgrade path related


Is this agreeable?


In case we miss any bugfix PRs, they may be sent to the 4.11 branch (to be cut soon), and in future and make its way to a minor release such as 4.11.1.0 etc.


For the purpose of keeping up with the release schedule, I think we should be reluctant to move the goal post again. I'll update on outstanding milestone PRs by EOD today. Thanks for your understanding.


- Rohit

<https://cloudstack.apache.org>



________________________________
From: Khosrow Moossavi <km...@cloudops.com>
Sent: Tuesday, January 9, 2018 10:55:59 PM
To: dev@cloudstack.apache.org; users@cloudstack.apache.org
Subject: Squeeze another PR (#2398) in 4.11 milestone

Hi community

We've found [1] and fixed [2] an issue in 4.10 regarding snapshots
remaining on primary storage (XenServer + Swift) which causes VDI chain
gets full after some time and user cannot take another snapshot.

Please include this in 4.11 milestone if you see fit.

[1]: https://issues.apache.org/jira/browse/CLOUDSTACK-10222
[2]: https://github.com/apache/cloudstack/pull/2398

Thanks
Khosrow

rohit.yadav@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue

Re: Squeeze another PR (#2398) in 4.11 milestone

Posted by Rohit Yadav <ro...@shapeblue.com>.

Hi all,


I think the criteria for any additional PRs to be considered for 4.11.0.0 release should be:

- Is it a blocker, especially blocking the release?

- Is is fixing any test failures or regressions?

- Is it release related, for example - packaging, systemvmtemplate, db-upgrade path related


Is this agreeable?


In case we miss any bugfix PRs, they may be sent to the 4.11 branch (to be cut soon), and in future and make its way to a minor release such as 4.11.1.0 etc.


For the purpose of keeping up with the release schedule, I think we should be reluctant to move the goal post again. I'll update on outstanding milestone PRs by EOD today. Thanks for your understanding.


- Rohit

<https://cloudstack.apache.org>



________________________________
From: Khosrow Moossavi <km...@cloudops.com>
Sent: Tuesday, January 9, 2018 10:55:59 PM
To: dev@cloudstack.apache.org; users@cloudstack.apache.org
Subject: Squeeze another PR (#2398) in 4.11 milestone

Hi community

We've found [1] and fixed [2] an issue in 4.10 regarding snapshots
remaining on primary storage (XenServer + Swift) which causes VDI chain
gets full after some time and user cannot take another snapshot.

Please include this in 4.11 milestone if you see fit.

[1]: https://issues.apache.org/jira/browse/CLOUDSTACK-10222
[2]: https://github.com/apache/cloudstack/pull/2398

Thanks
Khosrow

rohit.yadav@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue

Re: Squeeze another PR (#2398) in 4.11 milestone

Posted by Rafael Weingärtner <ra...@gmail.com>.

Yes. That is actually what we do.

Looking at the code of "Xenserver625StorageProcessor.java
<https://github.com/apache/cloudstack/pull/2398/files#diff-6eeb1a2fb818cccb14785ee80c93a561>",
it feels that we were already doing this even before this PR #2398.
However, I might not be understanding the complete picture here...

On Tue, Jan 9, 2018 at 7:33 PM, Tutkowski, Mike <Mi...@netapp.com>
wrote:

> “technically we should only have "one" on primary storage at any given
> point in time”
>
> I just wanted to follow up on this one.
>
> When we are copying a delta from the previous snapshot, we should actually
> have two snapshots on primary storage for a time.
>
> If the delta copy is successful, then we delete the older snapshot. If the
> delta copy fails, then we delete the newest snapshot.
>
> Is that correct?
>
> > On Jan 9, 2018, at 11:36 AM, Khosrow Moossavi <km...@cloudops.com>
> wrote:
> >
> > "We are already deleting snapshots in the primary storage, but we always
> > leave behind the last one"
> >
> > This issue doesn't happen only when something fails. We are not deleting
> the
> > snapshots from primary storage (not on XenServer 6.25+ and not since Feb
> > 2017)
> >
> > The fix of this PR is:
> >
> > 1) when transferred successfully to secondary storage everything except
> > "this"
> > snapshot get removed (technically we should only have "one" on primary
> > storage
> > at any given point in time) [towards the end of try block]
> > 2) when transferring to secondary storage fails, only "this" in-progress
> > snapshot
> > gets deleted. [finally block]
> >
> >
> >
> > On Tue, Jan 9, 2018 at 1:01 PM, Rafael Weingärtner <
> > rafaelweingartner@gmail.com> wrote:
> >
> >> Khosrow, I have seen this issue as well. It happens when there are
> problems
> >> to transfer the snapshot from the primary to the secondary storage.
> >> However, we need to clarify one thing. We are already deleting
> snapshots in
> >> the primary storage, but we always leave behind the last one. The
> problem
> >> is that if an error happens, during the transfer of the VHD from the
> >> primary to the secondary storage. The failed snapshot VDI is left
> behind in
> >> primary storage (for XenServer). These failed snapshots can accumulate
> with
> >> time and cause the problem you described because XenServer will not be
> able
> >> to coalesce the VHD files of the VM. Therefore, what you are addressing
> in
> >> this PR are cases when an exception happens during the transfer from
> >> primary to secondary storage.
> >>
> >> On Tue, Jan 9, 2018 at 3:25 PM, Khosrow Moossavi <
> kmoossavi@cloudops.com>
> >> wrote:
> >>
> >>> Hi community
> >>>
> >>> We've found [1] and fixed [2] an issue in 4.10 regarding snapshots
> >>> remaining on primary storage (XenServer + Swift) which causes VDI chain
> >>> gets full after some time and user cannot take another snapshot.
> >>>
> >>> Please include this in 4.11 milestone if you see fit.
> >>>
> >>> [1]: https://issues.apache.org/jira/browse/CLOUDSTACK-10222
> >>> [2]: https://github.com/apache/cloudstack/pull/2398
> >>>
> >>> Thanks
> >>> Khosrow
> >>>
> >>
> >>
> >>
> >> --
> >> Rafael Weingärtner
> >>
>



-- 
Rafael Weingärtner

Re: Squeeze another PR (#2398) in 4.11 milestone

Posted by Khosrow Moossavi <km...@cloudops.com>.

Rafael,

It got changed on this PR:
https://github.com/apache/cloudstack/pull/1749/files#diff-6eeb1a2fb818cccb14785ee80c93a561R560



On Tue, Jan 9, 2018 at 4:44 PM, Khosrow Moossavi <km...@cloudops.com>
wrote:

> That is correct Mike. The quoted part above was misleading, it should have
> been "at any given point in time *when transaction finished*"
> Removal of "other" or "current failed" snapshot happens at the very end of
> the method. The state of SR throughout time would be something like:
>
> 1) snapshot-01 (at rest)
> 2) snapshot-01, snapshot-02 (while taking snapshot-02 on primary storage
> and sending to secondary storage)
> 3) snapshot-02 (at rest again, after successful)
> OR
> 3) snapshot-01 (at rest again, after failure)
>
>
> Khosrow Moossavi
>
> Cloud Infrastructure Developer
>
> t 514.447.3456 <(514)%20447-3456>
>
> <https://goo.gl/NYZ8KK>
>
>
>
> On Tue, Jan 9, 2018 at 4:33 PM, Tutkowski, Mike <Mike.Tutkowski@netapp.com
> > wrote:
>
>> “technically we should only have "one" on primary storage at any given
>> point in time”
>>
>> I just wanted to follow up on this one.
>>
>> When we are copying a delta from the previous snapshot, we should
>> actually have two snapshots on primary storage for a time.
>>
>> If the delta copy is successful, then we delete the older snapshot. If
>> the delta copy fails, then we delete the newest snapshot.
>>
>> Is that correct?
>>
>> > On Jan 9, 2018, at 11:36 AM, Khosrow Moossavi <km...@cloudops.com>
>> wrote:
>> >
>> > "We are already deleting snapshots in the primary storage, but we always
>> > leave behind the last one"
>> >
>> > This issue doesn't happen only when something fails. We are not
>> deleting the
>> > snapshots from primary storage (not on XenServer 6.25+ and not since Feb
>> > 2017)
>> >
>> > The fix of this PR is:
>> >
>> > 1) when transferred successfully to secondary storage everything except
>> > "this"
>> > snapshot get removed (technically we should only have "one" on primary
>> > storage
>> > at any given point in time) [towards the end of try block]
>> > 2) when transferring to secondary storage fails, only "this" in-progress
>> > snapshot
>> > gets deleted. [finally block]
>> >
>> >
>> >
>> > On Tue, Jan 9, 2018 at 1:01 PM, Rafael Weingärtner <
>> > rafaelweingartner@gmail.com> wrote:
>> >
>> >> Khosrow, I have seen this issue as well. It happens when there are
>> problems
>> >> to transfer the snapshot from the primary to the secondary storage.
>> >> However, we need to clarify one thing. We are already deleting
>> snapshots in
>> >> the primary storage, but we always leave behind the last one. The
>> problem
>> >> is that if an error happens, during the transfer of the VHD from the
>> >> primary to the secondary storage. The failed snapshot VDI is left
>> behind in
>> >> primary storage (for XenServer). These failed snapshots can accumulate
>> with
>> >> time and cause the problem you described because XenServer will not be
>> able
>> >> to coalesce the VHD files of the VM. Therefore, what you are
>> addressing in
>> >> this PR are cases when an exception happens during the transfer from
>> >> primary to secondary storage.
>> >>
>> >> On Tue, Jan 9, 2018 at 3:25 PM, Khosrow Moossavi <
>> kmoossavi@cloudops.com>
>> >> wrote:
>> >>
>> >>> Hi community
>> >>>
>> >>> We've found [1] and fixed [2] an issue in 4.10 regarding snapshots
>> >>> remaining on primary storage (XenServer + Swift) which causes VDI
>> chain
>> >>> gets full after some time and user cannot take another snapshot.
>> >>>
>> >>> Please include this in 4.11 milestone if you see fit.
>> >>>
>> >>> [1]: https://issues.apache.org/jira/browse/CLOUDSTACK-10222
>> >>> [2]: https://github.com/apache/cloudstack/pull/2398
>> >>>
>> >>> Thanks
>> >>> Khosrow
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Rafael Weingärtner
>> >>
>>
>
>

Re: Squeeze another PR (#2398) in 4.11 milestone

Posted by Khosrow Moossavi <km...@cloudops.com>.

That is correct Mike. The quoted part above was misleading, it should have
been "at any given point in time *when transaction finished*"
Removal of "other" or "current failed" snapshot happens at the very end of
the method. The state of SR throughout time would be something like:

1) snapshot-01 (at rest)
2) snapshot-01, snapshot-02 (while taking snapshot-02 on primary storage
and sending to secondary storage)
3) snapshot-02 (at rest again, after successful)
OR
3) snapshot-01 (at rest again, after failure)


Khosrow Moossavi

Cloud Infrastructure Developer

t 514.447.3456

<https://goo.gl/NYZ8KK>



On Tue, Jan 9, 2018 at 4:33 PM, Tutkowski, Mike <Mi...@netapp.com>
wrote:

> “technically we should only have "one" on primary storage at any given
> point in time”
>
> I just wanted to follow up on this one.
>
> When we are copying a delta from the previous snapshot, we should actually
> have two snapshots on primary storage for a time.
>
> If the delta copy is successful, then we delete the older snapshot. If the
> delta copy fails, then we delete the newest snapshot.
>
> Is that correct?
>
> > On Jan 9, 2018, at 11:36 AM, Khosrow Moossavi <km...@cloudops.com>
> wrote:
> >
> > "We are already deleting snapshots in the primary storage, but we always
> > leave behind the last one"
> >
> > This issue doesn't happen only when something fails. We are not deleting
> the
> > snapshots from primary storage (not on XenServer 6.25+ and not since Feb
> > 2017)
> >
> > The fix of this PR is:
> >
> > 1) when transferred successfully to secondary storage everything except
> > "this"
> > snapshot get removed (technically we should only have "one" on primary
> > storage
> > at any given point in time) [towards the end of try block]
> > 2) when transferring to secondary storage fails, only "this" in-progress
> > snapshot
> > gets deleted. [finally block]
> >
> >
> >
> > On Tue, Jan 9, 2018 at 1:01 PM, Rafael Weingärtner <
> > rafaelweingartner@gmail.com> wrote:
> >
> >> Khosrow, I have seen this issue as well. It happens when there are
> problems
> >> to transfer the snapshot from the primary to the secondary storage.
> >> However, we need to clarify one thing. We are already deleting
> snapshots in
> >> the primary storage, but we always leave behind the last one. The
> problem
> >> is that if an error happens, during the transfer of the VHD from the
> >> primary to the secondary storage. The failed snapshot VDI is left
> behind in
> >> primary storage (for XenServer). These failed snapshots can accumulate
> with
> >> time and cause the problem you described because XenServer will not be
> able
> >> to coalesce the VHD files of the VM. Therefore, what you are addressing
> in
> >> this PR are cases when an exception happens during the transfer from
> >> primary to secondary storage.
> >>
> >> On Tue, Jan 9, 2018 at 3:25 PM, Khosrow Moossavi <
> kmoossavi@cloudops.com>
> >> wrote:
> >>
> >>> Hi community
> >>>
> >>> We've found [1] and fixed [2] an issue in 4.10 regarding snapshots
> >>> remaining on primary storage (XenServer + Swift) which causes VDI chain
> >>> gets full after some time and user cannot take another snapshot.
> >>>
> >>> Please include this in 4.11 milestone if you see fit.
> >>>
> >>> [1]: https://issues.apache.org/jira/browse/CLOUDSTACK-10222
> >>> [2]: https://github.com/apache/cloudstack/pull/2398
> >>>
> >>> Thanks
> >>> Khosrow
> >>>
> >>
> >>
> >>
> >> --
> >> Rafael Weingärtner
> >>
>

Re: Squeeze another PR (#2398) in 4.11 milestone

Posted by "Tutkowski, Mike" <Mi...@netapp.com>.

“technically we should only have "one" on primary storage at any given point in time”

I just wanted to follow up on this one.

When we are copying a delta from the previous snapshot, we should actually have two snapshots on primary storage for a time.

If the delta copy is successful, then we delete the older snapshot. If the delta copy fails, then we delete the newest snapshot.

Is that correct?

> On Jan 9, 2018, at 11:36 AM, Khosrow Moossavi <km...@cloudops.com> wrote:
> 
> "We are already deleting snapshots in the primary storage, but we always
> leave behind the last one"
> 
> This issue doesn't happen only when something fails. We are not deleting the
> snapshots from primary storage (not on XenServer 6.25+ and not since Feb
> 2017)
> 
> The fix of this PR is:
> 
> 1) when transferred successfully to secondary storage everything except
> "this"
> snapshot get removed (technically we should only have "one" on primary
> storage
> at any given point in time) [towards the end of try block]
> 2) when transferring to secondary storage fails, only "this" in-progress
> snapshot
> gets deleted. [finally block]
> 
> 
> 
> On Tue, Jan 9, 2018 at 1:01 PM, Rafael Weingärtner <
> rafaelweingartner@gmail.com> wrote:
> 
>> Khosrow, I have seen this issue as well. It happens when there are problems
>> to transfer the snapshot from the primary to the secondary storage.
>> However, we need to clarify one thing. We are already deleting snapshots in
>> the primary storage, but we always leave behind the last one. The problem
>> is that if an error happens, during the transfer of the VHD from the
>> primary to the secondary storage. The failed snapshot VDI is left behind in
>> primary storage (for XenServer). These failed snapshots can accumulate with
>> time and cause the problem you described because XenServer will not be able
>> to coalesce the VHD files of the VM. Therefore, what you are addressing in
>> this PR are cases when an exception happens during the transfer from
>> primary to secondary storage.
>> 
>> On Tue, Jan 9, 2018 at 3:25 PM, Khosrow Moossavi <km...@cloudops.com>
>> wrote:
>> 
>>> Hi community
>>> 
>>> We've found [1] and fixed [2] an issue in 4.10 regarding snapshots
>>> remaining on primary storage (XenServer + Swift) which causes VDI chain
>>> gets full after some time and user cannot take another snapshot.
>>> 
>>> Please include this in 4.11 milestone if you see fit.
>>> 
>>> [1]: https://issues.apache.org/jira/browse/CLOUDSTACK-10222
>>> [2]: https://github.com/apache/cloudstack/pull/2398
>>> 
>>> Thanks
>>> Khosrow
>>> 
>> 
>> 
>> 
>> --
>> Rafael Weingärtner
>>

Re: Squeeze another PR (#2398) in 4.11 milestone

Posted by Khosrow Moossavi <km...@cloudops.com>.

"We are already deleting snapshots in the primary storage, but we always
leave behind the last one"

This issue doesn't happen only when something fails. We are not deleting the
snapshots from primary storage (not on XenServer 6.25+ and not since Feb
2017)

The fix of this PR is:

1) when transferred successfully to secondary storage everything except
"this"
snapshot get removed (technically we should only have "one" on primary
storage
at any given point in time) [towards the end of try block]
2) when transferring to secondary storage fails, only "this" in-progress
snapshot
gets deleted. [finally block]



On Tue, Jan 9, 2018 at 1:01 PM, Rafael Weingärtner <
rafaelweingartner@gmail.com> wrote:

> Khosrow, I have seen this issue as well. It happens when there are problems
> to transfer the snapshot from the primary to the secondary storage.
> However, we need to clarify one thing. We are already deleting snapshots in
> the primary storage, but we always leave behind the last one. The problem
> is that if an error happens, during the transfer of the VHD from the
> primary to the secondary storage. The failed snapshot VDI is left behind in
> primary storage (for XenServer). These failed snapshots can accumulate with
> time and cause the problem you described because XenServer will not be able
> to coalesce the VHD files of the VM. Therefore, what you are addressing in
> this PR are cases when an exception happens during the transfer from
> primary to secondary storage.
>
> On Tue, Jan 9, 2018 at 3:25 PM, Khosrow Moossavi <km...@cloudops.com>
> wrote:
>
> > Hi community
> >
> > We've found [1] and fixed [2] an issue in 4.10 regarding snapshots
> > remaining on primary storage (XenServer + Swift) which causes VDI chain
> > gets full after some time and user cannot take another snapshot.
> >
> > Please include this in 4.11 milestone if you see fit.
> >
> > [1]: https://issues.apache.org/jira/browse/CLOUDSTACK-10222
> > [2]: https://github.com/apache/cloudstack/pull/2398
> >
> > Thanks
> > Khosrow
> >
>
>
>
> --
> Rafael Weingärtner
>

Re: Squeeze another PR (#2398) in 4.11 milestone

Posted by Rafael Weingärtner <ra...@gmail.com>.

Khosrow, I have seen this issue as well. It happens when there are problems
to transfer the snapshot from the primary to the secondary storage.
However, we need to clarify one thing. We are already deleting snapshots in
the primary storage, but we always leave behind the last one. The problem
is that if an error happens, during the transfer of the VHD from the
primary to the secondary storage. The failed snapshot VDI is left behind in
primary storage (for XenServer). These failed snapshots can accumulate with
time and cause the problem you described because XenServer will not be able
to coalesce the VHD files of the VM. Therefore, what you are addressing in
this PR are cases when an exception happens during the transfer from
primary to secondary storage.

On Tue, Jan 9, 2018 at 3:25 PM, Khosrow Moossavi <km...@cloudops.com>
wrote:

> Hi community
>
> We've found [1] and fixed [2] an issue in 4.10 regarding snapshots
> remaining on primary storage (XenServer + Swift) which causes VDI chain
> gets full after some time and user cannot take another snapshot.
>
> Please include this in 4.11 milestone if you see fit.
>
> [1]: https://issues.apache.org/jira/browse/CLOUDSTACK-10222
> [2]: https://github.com/apache/cloudstack/pull/2398
>
> Thanks
> Khosrow
>

-- 
Rafael Weingärtner