You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cloudstack.apache.org by Gabriel Beims Bräscher <ga...@gmail.com> on 2019/10/03 18:46:20 UTC

Re: 4.13 rbd snapshot delete failed

Hello folks,

Just pinging that I have created PR
https://github.com/apache/cloudstack/pull/3615 addressing the snapshot
deletion issue #3586 (https://github.com/apache/cloudstack/issues/3586).
Please, feel free to test and review.

Regards,
Gabriel.

Em seg, 9 de set de 2019 às 12:08, Gabriel Beims Bräscher <
gabrascher@gmail.com> escreveu:

> Thanks for the feedback Andrija and Andrei.
>
> I have opened issue #3590 for the snapshot rollback issue raised by
> Andrija.
> I will be investigating both issues:
> - RBD snapshot Revert #3590 (
> https://github.com/apache/cloudstack/issues/3590)
> - RBD snapshot deletion #3586 (
> https://github.com/apache/cloudstack/issues/3586)
>
> Cheers,
> Gabriel
>
> Em seg, 9 de set de 2019 às 09:41, Andrei Mikhailovsky <an...@arhont.com>
> escreveu:
>
>> A quick feedback from my side. I've never had a properly working delete
>> snapshot with ceph. Every week or so I have to manually delete all ceph
>> snapshots. However, the NFS secondary storage snapshots are deleted just
>> fine. I've been using CloudStack for 5+ years and it was always the case. I
>> am currently running 4.11.2 with ceph 13.2.6-1xenial.
>>
>> Andrei
>>
>> ----- Original Message -----
>> > From: "Andrija Panic" <an...@gmail.com>
>> > To: "Gabriel Beims Bräscher" <ga...@gmail.com>
>> > Cc: "users" <us...@cloudstack.apache.org>, "dev" <
>> dev@cloudstack.apache.org>
>> > Sent: Sunday, 8 September, 2019 19:17:59
>> > Subject: Re: 4.13 rbd snapshot delete failed
>>
>> > Thx Gabriel for extensive feedback.
>> > Actually my ex company added the code to really delete a RBD snap back
>> in
>> > 2016 or so, was part of 4.9 if not mistaken. So I expect the code is
>> there,
>> > but probably some exception is happening or regression...
>> >
>> > Cheers
>> >
>> > On Sun, Sep 8, 2019, 09:31 Gabriel Beims Bräscher <gabrascher@gmail.com
>> >
>> > wrote:
>> >
>> >> Thanks for the feedback, Andrija. It looks like delete was not totally
>> >> supported then (am I missing something?). I will take a look into this
>> and
>> >> open a PR adding propper support for rbd snapshot deletion if
>> necessary.
>> >>
>> >> Regarding the rollback, I have tested it several times and it worked;
>> >> however, I see a weak point on the Ceph rollback implementation.
>> >>
>> >> It looks like Li Jerry was able to execute the rollback without any
>> >> problem. Li, could you please post here  the log output: "Attempting to
>> >> rollback RBD snapshot [name:%s], [pool:%s], [volumeid:%s],
>> >> [snapshotid:%s]"? Andrija will not be able to see that log as the
>> exception
>> >> happen prior to it, the only way of you checking those values is via
>> remote
>> >> debugging. If you be able to post those values it would help as well on
>> >> sorting out what is wrong.
>> >>
>> >> I am checking the code base, running a few tests, and evaluating the
>> log
>> >> that you (Andrija) sent. What I can say for now is that it looks that
>> the
>> >> parameter "snapshotRelPath = snapshot.getPath()" [1] is a critical
>> piece of
>> >> code that can definitely break the rollback execution flow. My tests
>> had
>> >> pointed for a pattern but now I see other possibilities. I will
>> probably
>> >> add a few parameters on the rollback/revert command instead of using
>> the
>> >> path or review the path life-cycle and different execution flows in
>> order
>> >> to keep it safer to be used.
>> >> [1]
>> >>
>> https://github.com/apache/cloudstack/blob/50fc045f366bd9769eba85c4bc3ecdc0b7035c11/plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper
>> >>
>> >> A few details on the test environments and Ceph/RBD version:
>> >> CloudStack, KVM, and Ceph nodes are running with Ubuntu 18.04
>> >> Ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic
>> >> (stable)
>> >> RADOS Block Devices has snapshot rollback support since Ceph v10.0.2 [
>> >> https://github.com/ceph/ceph/pull/6878]
>> >> Rados-java [https://github.com/ceph/rados-java] supports snapshot
>> >> rollback since 0.5.0; rados-java 0.5.0 is the version used by
>> CloudStack
>> >> 4.13.0.0
>> >>
>> >> I will be updating here soon.
>> >>
>> >> Em dom, 8 de set de 2019 às 12:28, Wido den Hollander <wi...@widodh.nl>
>> >> escreveu:
>> >>
>> >>>
>> >>>
>> >>> On 9/8/19 5:26 AM, Andrija Panic wrote:
>> >>> > Maaany release ago, deleting Ceph volume snap, was also only
>> deleting
>> >>> it in
>> >>> > DB, so the RBD performance become terrible with many tens of (i. e.
>> >>> Hourly)
>> >>> > snapshots. I'll try to verify this on 4.13 myself, but Wido and the
>> guys
>> >>> > will know better...
>> >>>
>> >>> I pinged Gabriel and he's looking into it. He'll get back to it.
>> >>>
>> >>> Wido
>> >>>
>> >>> >
>> >>> > I
>> >>> >
>> >>> > On Sat, Sep 7, 2019, 08:34 li jerry <di...@hotmail.com> wrote:
>> >>> >
>> >>> >> I found it had nothing to do with  storage.cleanup.delay and
>> >>> >> storage.cleanup.interval.
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> The reason is that when DeleteSnapshot Cmd is executed, because
>> the RBD
>> >>> >> snapshot does not have Copy to secondary storage, it only changes
>> the
>> >>> >> database information, and does not enter the main storage to
>> delete the
>> >>> >> snapshot.
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> Log===========================
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> 2019-09-07 23:27:00,118 DEBUG [c.c.a.ApiServlet]
>> >>> >> (qtp504527234-17:ctx-2e407b61) (logid:445cbea8) ===START===
>> >>> 192.168.254.3
>> >>> >> -- GET
>> >>> >>
>> >>>
>> command=deleteSnapshot&id=0b50eb7e-4f42-4de7-96c2-1fae137c8c9f&response=json&_=1567869534480
>> >>> >>
>> >>> >> 2019-09-07 23:27:00,139 DEBUG [c.c.a.ApiServer]
>> >>> >> (qtp504527234-17:ctx-2e407b61 ctx-679fd276) (logid:445cbea8) CIDRs
>> from
>> >>> >> which account 'Acct[2f96c108-9408-11e9-a820-0200582b001a-admin]' is
>> >>> allowed
>> >>> >> to perform API calls: 0.0.0.0/0,::/0
>> >>> >>
>> >>> >> 2019-09-07 23:27:00,204 DEBUG [c.c.a.ApiServer]
>> >>> >> (qtp504527234-17:ctx-2e407b61 ctx-679fd276) (logid:445cbea8)
>> Retrieved
>> >>> >> cmdEventType from job info: SNAPSHOT.DELETE
>> >>> >>
>> >>> >> 2019-09-07 23:27:00,217 INFO  [o.a.c.f.j.i.AsyncJobMonitor]
>> >>> >> (API-Job-Executor-2:ctx-f0843047 job-1378) (logid:c34a368a) Add
>> >>> job-1378
>> >>> >> into job monitoring
>> >>> >>
>> >>> >> 2019-09-07 23:27:00,219 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>> >>> >> (qtp504527234-17:ctx-2e407b61 ctx-679fd276) (logid:445cbea8) submit
>> >>> async
>> >>> >> job-1378, details: AsyncJobVO {id:1378, userId: 2, accountId: 2,
>> >>> >> instanceType: Snapshot, instanceId: 13, cmd:
>> >>> >> org.apache.cloudstack.api.command.user.snapshot.DeleteSnapshotCmd,
>> >>> cmdInfo:
>> >>> >>
>> >>>
>> {"response":"json","ctxUserId":"2","httpmethod":"GET","ctxStartEventId":"1237","id":"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f","ctxDetails":"{\"interface
>> >>> >>
>> >>>
>> com.cloud.storage.Snapshot\":\"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f\"}","ctxAccountId":"2","uuid":"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f","cmdEventType":"SNAPSHOT.DELETE","_":"1567869534480"},
>> >>> >> cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode:
>> 0,
>> >>> >> result: null, initMsid: 2200502468634, completeMsid: null,
>> lastUpdated:
>> >>> >> null, lastPolled: null, created: null, removed: null}
>> >>> >>
>> >>> >> 2019-09-07 23:27:00,220 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>> >>> >> (API-Job-Executor-2:ctx-f0843047 job-1378) (logid:1cee5097)
>> Executing
>> >>> >> AsyncJobVO {id:1378, userId: 2, accountId: 2, instanceType:
>> Snapshot,
>> >>> >> instanceId: 13, cmd:
>> >>> >> org.apache.cloudstack.api.command.user.snapshot.DeleteSnapshotCmd,
>> >>> cmdInfo:
>> >>> >>
>> >>>
>> {"response":"json","ctxUserId":"2","httpmethod":"GET","ctxStartEventId":"1237","id":"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f","ctxDetails":"{\"interface
>> >>> >>
>> >>>
>> com.cloud.storage.Snapshot\":\"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f\"}","ctxAccountId":"2","uuid":"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f","cmdEventType":"SNAPSHOT.DELETE","_":"1567869534480"},
>> >>> >> cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode:
>> 0,
>> >>> >> result: null, initMsid: 2200502468634, completeMsid: null,
>> lastUpdated:
>> >>> >> null, lastPolled: null, created: null, removed: null}
>> >>> >>
>> >>> >> 2019-09-07 23:27:00,221 DEBUG [c.c.a.ApiServlet]
>> >>> >> (qtp504527234-17:ctx-2e407b61 ctx-679fd276) (logid:445cbea8)
>> ===END===
>> >>> >> 192.168.254.3 -- GET
>> >>> >>
>> >>>
>> command=deleteSnapshot&id=0b50eb7e-4f42-4de7-96c2-1fae137c8c9f&response=json&_=1567869534480
>> >>> >>
>> >>> >> 2019-09-07 23:27:00,305 DEBUG [c.c.a.m.ClusteredAgentAttache]
>> >>> >> (AgentManager-Handler-12:null) (logid:) Seq 1-8660140608456756853:
>> >>> Routing
>> >>> >> from 2199066247173
>> >>> >>
>> >>> >> 2019-09-07 23:27:00,305 DEBUG [o.a.c.s.s.XenserverSnapshotStrategy]
>> >>> >> (API-Job-Executor-2:ctx-f0843047 job-1378 ctx-f50e25a4)
>> >>> (logid:1cee5097)
>> >>> >> Can't find snapshot on backup storage, delete it in db
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> -Jerry
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> ________________________________
>> >>> >> 发件人: Andrija Panic <an...@gmail.com>
>> >>> >> 发送时间: Saturday, September 7, 2019 1:07:19 AM
>> >>> >> 收件人: users <us...@cloudstack.apache.org>
>> >>> >> 抄送: dev@cloudstack.apache.org <de...@cloudstack.apache.org>
>> >>> >> 主题: Re: 4.13 rbd snapshot delete failed
>> >>> >>
>> >>> >> storage.cleanup.delay
>> >>> >> storage.cleanup.interval
>> >>> >>
>> >>> >> put both to 60 (seconds) and wait for up to 2min - should be
>> deleted
>> >>> just
>> >>> >> fine...
>> >>> >>
>> >>> >> cheers
>> >>> >>
>> >>> >> On Fri, 6 Sep 2019 at 18:52, li jerry <di...@hotmail.com> wrote:
>> >>> >>
>> >>> >>> Hello All
>> >>> >>>
>> >>> >>> When I tested ACS 4.13 KVM + CEPH snapshot, I found that snapshots
>> >>> could
>> >>> >>> be created and rolled back (using API alone), but deletion could
>> not
>> >>> be
>> >>> >>> completed.
>> >>> >>>
>> >>> >>>
>> >>> >>>
>> >>> >>> After executing the deletion API, the snapshot will disappear
>> from the
>> >>> >>> list Snapshots, but the snapshot on CEPH RBD will not be deleted
>> (rbd
>> >>> >> snap
>> >>> >>> list rbd/ac510428-5d09-4e86-9d34-9dfab3715b7c)
>> >>> >>>
>> >>> >>>
>> >>> >>>
>> >>> >>> Is there any way we can completely delete the snapshot?
>> >>> >>>
>> >>> >>> -Jerry
>> >>> >>>
>> >>> >>>
>> >>> >>
>> >>> >> --
>> >>> >>
>> >>> >> Andrija Panić
>> >>> >>
>> >>> >
>> >>>
>>
>

Re: 4.13 rbd snapshot delete failed

Posted by Andrija Panic <an...@gmail.com>.
Thx Gabriel - I've commented on the PR - needs some more love - but we're
almost there!

On Thu, 3 Oct 2019 at 20:46, Gabriel Beims Bräscher <ga...@gmail.com>
wrote:

> Hello folks,
>
> Just pinging that I have created PR
> https://github.com/apache/cloudstack/pull/3615 addressing the snapshot
> deletion issue #3586 (https://github.com/apache/cloudstack/issues/3586).
> Please, feel free to test and review.
>
> Regards,
> Gabriel.
>
> Em seg, 9 de set de 2019 às 12:08, Gabriel Beims Bräscher <
> gabrascher@gmail.com> escreveu:
>
> > Thanks for the feedback Andrija and Andrei.
> >
> > I have opened issue #3590 for the snapshot rollback issue raised by
> > Andrija.
> > I will be investigating both issues:
> > - RBD snapshot Revert #3590 (
> > https://github.com/apache/cloudstack/issues/3590)
> > - RBD snapshot deletion #3586 (
> > https://github.com/apache/cloudstack/issues/3586)
> >
> > Cheers,
> > Gabriel
> >
> > Em seg, 9 de set de 2019 às 09:41, Andrei Mikhailovsky <
> andrei@arhont.com>
> > escreveu:
> >
> >> A quick feedback from my side. I've never had a properly working delete
> >> snapshot with ceph. Every week or so I have to manually delete all ceph
> >> snapshots. However, the NFS secondary storage snapshots are deleted just
> >> fine. I've been using CloudStack for 5+ years and it was always the
> case. I
> >> am currently running 4.11.2 with ceph 13.2.6-1xenial.
> >>
> >> Andrei
> >>
> >> ----- Original Message -----
> >> > From: "Andrija Panic" <an...@gmail.com>
> >> > To: "Gabriel Beims Bräscher" <ga...@gmail.com>
> >> > Cc: "users" <us...@cloudstack.apache.org>, "dev" <
> >> dev@cloudstack.apache.org>
> >> > Sent: Sunday, 8 September, 2019 19:17:59
> >> > Subject: Re: 4.13 rbd snapshot delete failed
> >>
> >> > Thx Gabriel for extensive feedback.
> >> > Actually my ex company added the code to really delete a RBD snap back
> >> in
> >> > 2016 or so, was part of 4.9 if not mistaken. So I expect the code is
> >> there,
> >> > but probably some exception is happening or regression...
> >> >
> >> > Cheers
> >> >
> >> > On Sun, Sep 8, 2019, 09:31 Gabriel Beims Bräscher <
> gabrascher@gmail.com
> >> >
> >> > wrote:
> >> >
> >> >> Thanks for the feedback, Andrija. It looks like delete was not
> totally
> >> >> supported then (am I missing something?). I will take a look into
> this
> >> and
> >> >> open a PR adding propper support for rbd snapshot deletion if
> >> necessary.
> >> >>
> >> >> Regarding the rollback, I have tested it several times and it worked;
> >> >> however, I see a weak point on the Ceph rollback implementation.
> >> >>
> >> >> It looks like Li Jerry was able to execute the rollback without any
> >> >> problem. Li, could you please post here  the log output: "Attempting
> to
> >> >> rollback RBD snapshot [name:%s], [pool:%s], [volumeid:%s],
> >> >> [snapshotid:%s]"? Andrija will not be able to see that log as the
> >> exception
> >> >> happen prior to it, the only way of you checking those values is via
> >> remote
> >> >> debugging. If you be able to post those values it would help as well
> on
> >> >> sorting out what is wrong.
> >> >>
> >> >> I am checking the code base, running a few tests, and evaluating the
> >> log
> >> >> that you (Andrija) sent. What I can say for now is that it looks that
> >> the
> >> >> parameter "snapshotRelPath = snapshot.getPath()" [1] is a critical
> >> piece of
> >> >> code that can definitely break the rollback execution flow. My tests
> >> had
> >> >> pointed for a pattern but now I see other possibilities. I will
> >> probably
> >> >> add a few parameters on the rollback/revert command instead of using
> >> the
> >> >> path or review the path life-cycle and different execution flows in
> >> order
> >> >> to keep it safer to be used.
> >> >> [1]
> >> >>
> >>
> https://github.com/apache/cloudstack/blob/50fc045f366bd9769eba85c4bc3ecdc0b7035c11/plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper
> >> >>
> >> >> A few details on the test environments and Ceph/RBD version:
> >> >> CloudStack, KVM, and Ceph nodes are running with Ubuntu 18.04
> >> >> Ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic
> >> >> (stable)
> >> >> RADOS Block Devices has snapshot rollback support since Ceph v10.0.2
> [
> >> >> https://github.com/ceph/ceph/pull/6878]
> >> >> Rados-java [https://github.com/ceph/rados-java] supports snapshot
> >> >> rollback since 0.5.0; rados-java 0.5.0 is the version used by
> >> CloudStack
> >> >> 4.13.0.0
> >> >>
> >> >> I will be updating here soon.
> >> >>
> >> >> Em dom, 8 de set de 2019 às 12:28, Wido den Hollander <
> wido@widodh.nl>
> >> >> escreveu:
> >> >>
> >> >>>
> >> >>>
> >> >>> On 9/8/19 5:26 AM, Andrija Panic wrote:
> >> >>> > Maaany release ago, deleting Ceph volume snap, was also only
> >> deleting
> >> >>> it in
> >> >>> > DB, so the RBD performance become terrible with many tens of (i.
> e.
> >> >>> Hourly)
> >> >>> > snapshots. I'll try to verify this on 4.13 myself, but Wido and
> the
> >> guys
> >> >>> > will know better...
> >> >>>
> >> >>> I pinged Gabriel and he's looking into it. He'll get back to it.
> >> >>>
> >> >>> Wido
> >> >>>
> >> >>> >
> >> >>> > I
> >> >>> >
> >> >>> > On Sat, Sep 7, 2019, 08:34 li jerry <di...@hotmail.com> wrote:
> >> >>> >
> >> >>> >> I found it had nothing to do with  storage.cleanup.delay and
> >> >>> >> storage.cleanup.interval.
> >> >>> >>
> >> >>> >>
> >> >>> >>
> >> >>> >> The reason is that when DeleteSnapshot Cmd is executed, because
> >> the RBD
> >> >>> >> snapshot does not have Copy to secondary storage, it only changes
> >> the
> >> >>> >> database information, and does not enter the main storage to
> >> delete the
> >> >>> >> snapshot.
> >> >>> >>
> >> >>> >>
> >> >>> >>
> >> >>> >>
> >> >>> >>
> >> >>> >> Log===========================
> >> >>> >>
> >> >>> >>
> >> >>> >>
> >> >>> >> 2019-09-07 23:27:00,118 DEBUG [c.c.a.ApiServlet]
> >> >>> >> (qtp504527234-17:ctx-2e407b61) (logid:445cbea8) ===START===
> >> >>> 192.168.254.3
> >> >>> >> -- GET
> >> >>> >>
> >> >>>
> >>
> command=deleteSnapshot&id=0b50eb7e-4f42-4de7-96c2-1fae137c8c9f&response=json&_=1567869534480
> >> >>> >>
> >> >>> >> 2019-09-07 23:27:00,139 DEBUG [c.c.a.ApiServer]
> >> >>> >> (qtp504527234-17:ctx-2e407b61 ctx-679fd276) (logid:445cbea8)
> CIDRs
> >> from
> >> >>> >> which account 'Acct[2f96c108-9408-11e9-a820-0200582b001a-admin]'
> is
> >> >>> allowed
> >> >>> >> to perform API calls: 0.0.0.0/0,::/0
> >> >>> >>
> >> >>> >> 2019-09-07 23:27:00,204 DEBUG [c.c.a.ApiServer]
> >> >>> >> (qtp504527234-17:ctx-2e407b61 ctx-679fd276) (logid:445cbea8)
> >> Retrieved
> >> >>> >> cmdEventType from job info: SNAPSHOT.DELETE
> >> >>> >>
> >> >>> >> 2019-09-07 23:27:00,217 INFO  [o.a.c.f.j.i.AsyncJobMonitor]
> >> >>> >> (API-Job-Executor-2:ctx-f0843047 job-1378) (logid:c34a368a) Add
> >> >>> job-1378
> >> >>> >> into job monitoring
> >> >>> >>
> >> >>> >> 2019-09-07 23:27:00,219 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
> >> >>> >> (qtp504527234-17:ctx-2e407b61 ctx-679fd276) (logid:445cbea8)
> submit
> >> >>> async
> >> >>> >> job-1378, details: AsyncJobVO {id:1378, userId: 2, accountId: 2,
> >> >>> >> instanceType: Snapshot, instanceId: 13, cmd:
> >> >>> >>
> org.apache.cloudstack.api.command.user.snapshot.DeleteSnapshotCmd,
> >> >>> cmdInfo:
> >> >>> >>
> >> >>>
> >>
> {"response":"json","ctxUserId":"2","httpmethod":"GET","ctxStartEventId":"1237","id":"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f","ctxDetails":"{\"interface
> >> >>> >>
> >> >>>
> >>
> com.cloud.storage.Snapshot\":\"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f\"}","ctxAccountId":"2","uuid":"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f","cmdEventType":"SNAPSHOT.DELETE","_":"1567869534480"},
> >> >>> >> cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode:
> >> 0,
> >> >>> >> result: null, initMsid: 2200502468634, completeMsid: null,
> >> lastUpdated:
> >> >>> >> null, lastPolled: null, created: null, removed: null}
> >> >>> >>
> >> >>> >> 2019-09-07 23:27:00,220 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
> >> >>> >> (API-Job-Executor-2:ctx-f0843047 job-1378) (logid:1cee5097)
> >> Executing
> >> >>> >> AsyncJobVO {id:1378, userId: 2, accountId: 2, instanceType:
> >> Snapshot,
> >> >>> >> instanceId: 13, cmd:
> >> >>> >>
> org.apache.cloudstack.api.command.user.snapshot.DeleteSnapshotCmd,
> >> >>> cmdInfo:
> >> >>> >>
> >> >>>
> >>
> {"response":"json","ctxUserId":"2","httpmethod":"GET","ctxStartEventId":"1237","id":"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f","ctxDetails":"{\"interface
> >> >>> >>
> >> >>>
> >>
> com.cloud.storage.Snapshot\":\"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f\"}","ctxAccountId":"2","uuid":"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f","cmdEventType":"SNAPSHOT.DELETE","_":"1567869534480"},
> >> >>> >> cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode:
> >> 0,
> >> >>> >> result: null, initMsid: 2200502468634, completeMsid: null,
> >> lastUpdated:
> >> >>> >> null, lastPolled: null, created: null, removed: null}
> >> >>> >>
> >> >>> >> 2019-09-07 23:27:00,221 DEBUG [c.c.a.ApiServlet]
> >> >>> >> (qtp504527234-17:ctx-2e407b61 ctx-679fd276) (logid:445cbea8)
> >> ===END===
> >> >>> >> 192.168.254.3 -- GET
> >> >>> >>
> >> >>>
> >>
> command=deleteSnapshot&id=0b50eb7e-4f42-4de7-96c2-1fae137c8c9f&response=json&_=1567869534480
> >> >>> >>
> >> >>> >> 2019-09-07 23:27:00,305 DEBUG [c.c.a.m.ClusteredAgentAttache]
> >> >>> >> (AgentManager-Handler-12:null) (logid:) Seq
> 1-8660140608456756853:
> >> >>> Routing
> >> >>> >> from 2199066247173
> >> >>> >>
> >> >>> >> 2019-09-07 23:27:00,305 DEBUG
> [o.a.c.s.s.XenserverSnapshotStrategy]
> >> >>> >> (API-Job-Executor-2:ctx-f0843047 job-1378 ctx-f50e25a4)
> >> >>> (logid:1cee5097)
> >> >>> >> Can't find snapshot on backup storage, delete it in db
> >> >>> >>
> >> >>> >>
> >> >>> >>
> >> >>> >> -Jerry
> >> >>> >>
> >> >>> >>
> >> >>> >>
> >> >>> >> ________________________________
> >> >>> >> 发件人: Andrija Panic <an...@gmail.com>
> >> >>> >> 发送时间: Saturday, September 7, 2019 1:07:19 AM
> >> >>> >> 收件人: users <us...@cloudstack.apache.org>
> >> >>> >> 抄送: dev@cloudstack.apache.org <de...@cloudstack.apache.org>
> >> >>> >> 主题: Re: 4.13 rbd snapshot delete failed
> >> >>> >>
> >> >>> >> storage.cleanup.delay
> >> >>> >> storage.cleanup.interval
> >> >>> >>
> >> >>> >> put both to 60 (seconds) and wait for up to 2min - should be
> >> deleted
> >> >>> just
> >> >>> >> fine...
> >> >>> >>
> >> >>> >> cheers
> >> >>> >>
> >> >>> >> On Fri, 6 Sep 2019 at 18:52, li jerry <di...@hotmail.com>
> wrote:
> >> >>> >>
> >> >>> >>> Hello All
> >> >>> >>>
> >> >>> >>> When I tested ACS 4.13 KVM + CEPH snapshot, I found that
> snapshots
> >> >>> could
> >> >>> >>> be created and rolled back (using API alone), but deletion could
> >> not
> >> >>> be
> >> >>> >>> completed.
> >> >>> >>>
> >> >>> >>>
> >> >>> >>>
> >> >>> >>> After executing the deletion API, the snapshot will disappear
> >> from the
> >> >>> >>> list Snapshots, but the snapshot on CEPH RBD will not be deleted
> >> (rbd
> >> >>> >> snap
> >> >>> >>> list rbd/ac510428-5d09-4e86-9d34-9dfab3715b7c)
> >> >>> >>>
> >> >>> >>>
> >> >>> >>>
> >> >>> >>> Is there any way we can completely delete the snapshot?
> >> >>> >>>
> >> >>> >>> -Jerry
> >> >>> >>>
> >> >>> >>>
> >> >>> >>
> >> >>> >> --
> >> >>> >>
> >> >>> >> Andrija Panić
> >> >>> >>
> >> >>> >
> >> >>>
> >>
> >
>


-- 

Andrija Panić

Re: 4.13 rbd snapshot delete failed

Posted by Andrija Panic <an...@gmail.com>.
Thx Gabriel - I've commented on the PR - needs some more love - but we're
almost there!

On Thu, 3 Oct 2019 at 20:46, Gabriel Beims Bräscher <ga...@gmail.com>
wrote:

> Hello folks,
>
> Just pinging that I have created PR
> https://github.com/apache/cloudstack/pull/3615 addressing the snapshot
> deletion issue #3586 (https://github.com/apache/cloudstack/issues/3586).
> Please, feel free to test and review.
>
> Regards,
> Gabriel.
>
> Em seg, 9 de set de 2019 às 12:08, Gabriel Beims Bräscher <
> gabrascher@gmail.com> escreveu:
>
> > Thanks for the feedback Andrija and Andrei.
> >
> > I have opened issue #3590 for the snapshot rollback issue raised by
> > Andrija.
> > I will be investigating both issues:
> > - RBD snapshot Revert #3590 (
> > https://github.com/apache/cloudstack/issues/3590)
> > - RBD snapshot deletion #3586 (
> > https://github.com/apache/cloudstack/issues/3586)
> >
> > Cheers,
> > Gabriel
> >
> > Em seg, 9 de set de 2019 às 09:41, Andrei Mikhailovsky <
> andrei@arhont.com>
> > escreveu:
> >
> >> A quick feedback from my side. I've never had a properly working delete
> >> snapshot with ceph. Every week or so I have to manually delete all ceph
> >> snapshots. However, the NFS secondary storage snapshots are deleted just
> >> fine. I've been using CloudStack for 5+ years and it was always the
> case. I
> >> am currently running 4.11.2 with ceph 13.2.6-1xenial.
> >>
> >> Andrei
> >>
> >> ----- Original Message -----
> >> > From: "Andrija Panic" <an...@gmail.com>
> >> > To: "Gabriel Beims Bräscher" <ga...@gmail.com>
> >> > Cc: "users" <us...@cloudstack.apache.org>, "dev" <
> >> dev@cloudstack.apache.org>
> >> > Sent: Sunday, 8 September, 2019 19:17:59
> >> > Subject: Re: 4.13 rbd snapshot delete failed
> >>
> >> > Thx Gabriel for extensive feedback.
> >> > Actually my ex company added the code to really delete a RBD snap back
> >> in
> >> > 2016 or so, was part of 4.9 if not mistaken. So I expect the code is
> >> there,
> >> > but probably some exception is happening or regression...
> >> >
> >> > Cheers
> >> >
> >> > On Sun, Sep 8, 2019, 09:31 Gabriel Beims Bräscher <
> gabrascher@gmail.com
> >> >
> >> > wrote:
> >> >
> >> >> Thanks for the feedback, Andrija. It looks like delete was not
> totally
> >> >> supported then (am I missing something?). I will take a look into
> this
> >> and
> >> >> open a PR adding propper support for rbd snapshot deletion if
> >> necessary.
> >> >>
> >> >> Regarding the rollback, I have tested it several times and it worked;
> >> >> however, I see a weak point on the Ceph rollback implementation.
> >> >>
> >> >> It looks like Li Jerry was able to execute the rollback without any
> >> >> problem. Li, could you please post here  the log output: "Attempting
> to
> >> >> rollback RBD snapshot [name:%s], [pool:%s], [volumeid:%s],
> >> >> [snapshotid:%s]"? Andrija will not be able to see that log as the
> >> exception
> >> >> happen prior to it, the only way of you checking those values is via
> >> remote
> >> >> debugging. If you be able to post those values it would help as well
> on
> >> >> sorting out what is wrong.
> >> >>
> >> >> I am checking the code base, running a few tests, and evaluating the
> >> log
> >> >> that you (Andrija) sent. What I can say for now is that it looks that
> >> the
> >> >> parameter "snapshotRelPath = snapshot.getPath()" [1] is a critical
> >> piece of
> >> >> code that can definitely break the rollback execution flow. My tests
> >> had
> >> >> pointed for a pattern but now I see other possibilities. I will
> >> probably
> >> >> add a few parameters on the rollback/revert command instead of using
> >> the
> >> >> path or review the path life-cycle and different execution flows in
> >> order
> >> >> to keep it safer to be used.
> >> >> [1]
> >> >>
> >>
> https://github.com/apache/cloudstack/blob/50fc045f366bd9769eba85c4bc3ecdc0b7035c11/plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper
> >> >>
> >> >> A few details on the test environments and Ceph/RBD version:
> >> >> CloudStack, KVM, and Ceph nodes are running with Ubuntu 18.04
> >> >> Ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic
> >> >> (stable)
> >> >> RADOS Block Devices has snapshot rollback support since Ceph v10.0.2
> [
> >> >> https://github.com/ceph/ceph/pull/6878]
> >> >> Rados-java [https://github.com/ceph/rados-java] supports snapshot
> >> >> rollback since 0.5.0; rados-java 0.5.0 is the version used by
> >> CloudStack
> >> >> 4.13.0.0
> >> >>
> >> >> I will be updating here soon.
> >> >>
> >> >> Em dom, 8 de set de 2019 às 12:28, Wido den Hollander <
> wido@widodh.nl>
> >> >> escreveu:
> >> >>
> >> >>>
> >> >>>
> >> >>> On 9/8/19 5:26 AM, Andrija Panic wrote:
> >> >>> > Maaany release ago, deleting Ceph volume snap, was also only
> >> deleting
> >> >>> it in
> >> >>> > DB, so the RBD performance become terrible with many tens of (i.
> e.
> >> >>> Hourly)
> >> >>> > snapshots. I'll try to verify this on 4.13 myself, but Wido and
> the
> >> guys
> >> >>> > will know better...
> >> >>>
> >> >>> I pinged Gabriel and he's looking into it. He'll get back to it.
> >> >>>
> >> >>> Wido
> >> >>>
> >> >>> >
> >> >>> > I
> >> >>> >
> >> >>> > On Sat, Sep 7, 2019, 08:34 li jerry <di...@hotmail.com> wrote:
> >> >>> >
> >> >>> >> I found it had nothing to do with  storage.cleanup.delay and
> >> >>> >> storage.cleanup.interval.
> >> >>> >>
> >> >>> >>
> >> >>> >>
> >> >>> >> The reason is that when DeleteSnapshot Cmd is executed, because
> >> the RBD
> >> >>> >> snapshot does not have Copy to secondary storage, it only changes
> >> the
> >> >>> >> database information, and does not enter the main storage to
> >> delete the
> >> >>> >> snapshot.
> >> >>> >>
> >> >>> >>
> >> >>> >>
> >> >>> >>
> >> >>> >>
> >> >>> >> Log===========================
> >> >>> >>
> >> >>> >>
> >> >>> >>
> >> >>> >> 2019-09-07 23:27:00,118 DEBUG [c.c.a.ApiServlet]
> >> >>> >> (qtp504527234-17:ctx-2e407b61) (logid:445cbea8) ===START===
> >> >>> 192.168.254.3
> >> >>> >> -- GET
> >> >>> >>
> >> >>>
> >>
> command=deleteSnapshot&id=0b50eb7e-4f42-4de7-96c2-1fae137c8c9f&response=json&_=1567869534480
> >> >>> >>
> >> >>> >> 2019-09-07 23:27:00,139 DEBUG [c.c.a.ApiServer]
> >> >>> >> (qtp504527234-17:ctx-2e407b61 ctx-679fd276) (logid:445cbea8)
> CIDRs
> >> from
> >> >>> >> which account 'Acct[2f96c108-9408-11e9-a820-0200582b001a-admin]'
> is
> >> >>> allowed
> >> >>> >> to perform API calls: 0.0.0.0/0,::/0
> >> >>> >>
> >> >>> >> 2019-09-07 23:27:00,204 DEBUG [c.c.a.ApiServer]
> >> >>> >> (qtp504527234-17:ctx-2e407b61 ctx-679fd276) (logid:445cbea8)
> >> Retrieved
> >> >>> >> cmdEventType from job info: SNAPSHOT.DELETE
> >> >>> >>
> >> >>> >> 2019-09-07 23:27:00,217 INFO  [o.a.c.f.j.i.AsyncJobMonitor]
> >> >>> >> (API-Job-Executor-2:ctx-f0843047 job-1378) (logid:c34a368a) Add
> >> >>> job-1378
> >> >>> >> into job monitoring
> >> >>> >>
> >> >>> >> 2019-09-07 23:27:00,219 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
> >> >>> >> (qtp504527234-17:ctx-2e407b61 ctx-679fd276) (logid:445cbea8)
> submit
> >> >>> async
> >> >>> >> job-1378, details: AsyncJobVO {id:1378, userId: 2, accountId: 2,
> >> >>> >> instanceType: Snapshot, instanceId: 13, cmd:
> >> >>> >>
> org.apache.cloudstack.api.command.user.snapshot.DeleteSnapshotCmd,
> >> >>> cmdInfo:
> >> >>> >>
> >> >>>
> >>
> {"response":"json","ctxUserId":"2","httpmethod":"GET","ctxStartEventId":"1237","id":"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f","ctxDetails":"{\"interface
> >> >>> >>
> >> >>>
> >>
> com.cloud.storage.Snapshot\":\"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f\"}","ctxAccountId":"2","uuid":"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f","cmdEventType":"SNAPSHOT.DELETE","_":"1567869534480"},
> >> >>> >> cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode:
> >> 0,
> >> >>> >> result: null, initMsid: 2200502468634, completeMsid: null,
> >> lastUpdated:
> >> >>> >> null, lastPolled: null, created: null, removed: null}
> >> >>> >>
> >> >>> >> 2019-09-07 23:27:00,220 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
> >> >>> >> (API-Job-Executor-2:ctx-f0843047 job-1378) (logid:1cee5097)
> >> Executing
> >> >>> >> AsyncJobVO {id:1378, userId: 2, accountId: 2, instanceType:
> >> Snapshot,
> >> >>> >> instanceId: 13, cmd:
> >> >>> >>
> org.apache.cloudstack.api.command.user.snapshot.DeleteSnapshotCmd,
> >> >>> cmdInfo:
> >> >>> >>
> >> >>>
> >>
> {"response":"json","ctxUserId":"2","httpmethod":"GET","ctxStartEventId":"1237","id":"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f","ctxDetails":"{\"interface
> >> >>> >>
> >> >>>
> >>
> com.cloud.storage.Snapshot\":\"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f\"}","ctxAccountId":"2","uuid":"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f","cmdEventType":"SNAPSHOT.DELETE","_":"1567869534480"},
> >> >>> >> cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode:
> >> 0,
> >> >>> >> result: null, initMsid: 2200502468634, completeMsid: null,
> >> lastUpdated:
> >> >>> >> null, lastPolled: null, created: null, removed: null}
> >> >>> >>
> >> >>> >> 2019-09-07 23:27:00,221 DEBUG [c.c.a.ApiServlet]
> >> >>> >> (qtp504527234-17:ctx-2e407b61 ctx-679fd276) (logid:445cbea8)
> >> ===END===
> >> >>> >> 192.168.254.3 -- GET
> >> >>> >>
> >> >>>
> >>
> command=deleteSnapshot&id=0b50eb7e-4f42-4de7-96c2-1fae137c8c9f&response=json&_=1567869534480
> >> >>> >>
> >> >>> >> 2019-09-07 23:27:00,305 DEBUG [c.c.a.m.ClusteredAgentAttache]
> >> >>> >> (AgentManager-Handler-12:null) (logid:) Seq
> 1-8660140608456756853:
> >> >>> Routing
> >> >>> >> from 2199066247173
> >> >>> >>
> >> >>> >> 2019-09-07 23:27:00,305 DEBUG
> [o.a.c.s.s.XenserverSnapshotStrategy]
> >> >>> >> (API-Job-Executor-2:ctx-f0843047 job-1378 ctx-f50e25a4)
> >> >>> (logid:1cee5097)
> >> >>> >> Can't find snapshot on backup storage, delete it in db
> >> >>> >>
> >> >>> >>
> >> >>> >>
> >> >>> >> -Jerry
> >> >>> >>
> >> >>> >>
> >> >>> >>
> >> >>> >> ________________________________
> >> >>> >> 发件人: Andrija Panic <an...@gmail.com>
> >> >>> >> 发送时间: Saturday, September 7, 2019 1:07:19 AM
> >> >>> >> 收件人: users <us...@cloudstack.apache.org>
> >> >>> >> 抄送: dev@cloudstack.apache.org <de...@cloudstack.apache.org>
> >> >>> >> 主题: Re: 4.13 rbd snapshot delete failed
> >> >>> >>
> >> >>> >> storage.cleanup.delay
> >> >>> >> storage.cleanup.interval
> >> >>> >>
> >> >>> >> put both to 60 (seconds) and wait for up to 2min - should be
> >> deleted
> >> >>> just
> >> >>> >> fine...
> >> >>> >>
> >> >>> >> cheers
> >> >>> >>
> >> >>> >> On Fri, 6 Sep 2019 at 18:52, li jerry <di...@hotmail.com>
> wrote:
> >> >>> >>
> >> >>> >>> Hello All
> >> >>> >>>
> >> >>> >>> When I tested ACS 4.13 KVM + CEPH snapshot, I found that
> snapshots
> >> >>> could
> >> >>> >>> be created and rolled back (using API alone), but deletion could
> >> not
> >> >>> be
> >> >>> >>> completed.
> >> >>> >>>
> >> >>> >>>
> >> >>> >>>
> >> >>> >>> After executing the deletion API, the snapshot will disappear
> >> from the
> >> >>> >>> list Snapshots, but the snapshot on CEPH RBD will not be deleted
> >> (rbd
> >> >>> >> snap
> >> >>> >>> list rbd/ac510428-5d09-4e86-9d34-9dfab3715b7c)
> >> >>> >>>
> >> >>> >>>
> >> >>> >>>
> >> >>> >>> Is there any way we can completely delete the snapshot?
> >> >>> >>>
> >> >>> >>> -Jerry
> >> >>> >>>
> >> >>> >>>
> >> >>> >>
> >> >>> >> --
> >> >>> >>
> >> >>> >> Andrija Panić
> >> >>> >>
> >> >>> >
> >> >>>
> >>
> >
>


-- 

Andrija Panić