You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cloudstack.apache.org by Yiping Zhang <yz...@marketo.com> on 2016/05/04 22:22:32 UTC

[Urgent]: corrupt DB after VM live migration with storage migration

Hi, all:

I am in a situation that I need some help:

I did a live migration with storage migration required for a production VM instance from one cluster to another.  The first migration attempt failed after some time, but the second attempt succeeded. During all this time the VM instance is accessible (and it is still up and running).  However, when I use my api script to query volumes, it still reports that the volume is on the old cluster’s primary storage.  If I shut down this VM,  I am afraid that it won’t start again as it would try to use non-existing volumes.

Checking database, sure enough, the DB still has old info about these volumes:


mysql> select id,name from storage_pool where id=1 or id=8;

+----+------------------+

| id | name             |

+----+------------------+

|  1 | abprod-primary1  |

|  8 | abprod-p1c2-pri1 |

+----+------------------+

2 rows in set (0.01 sec)


Here the old cluster’s primary storage has id=1, and the new cluster’s primary storage has id=8.


Here are the entries with wrong info in volumes table:


mysql> select id,name, uuid, path,pool_id, removed from volumes where name='ROOT-97' or name='DATA-97';

+-----+---------+--------------------------------------+--------------------------------------+---------+---------------------+

| id  | name    | uuid                                 | path                                 | pool_id | removed             |

+-----+---------+--------------------------------------+--------------------------------------+---------+---------------------+

| 124 | ROOT-97 | 224bf673-fda8-4ccc-9c30-fd1068aee005 | 5d1ab4ef-2629-4384-a56a-e2dc1055d032 |       1 | NULL                |

| 125 | DATA-97 | d385d635-9230-4130-8d1f-702dbcf0f22c | 6b75496d-5907-46c3-8836-5618f11dac8e |       1 | NULL                |

| 316 | ROOT-97 | 691b5c12-7ec4-408d-b66f-1ff041f149c1 | NULL                                 |       8 | 2016-05-03 06:10:40 |

| 317 | ROOT-97 | 8ba29fcf-a81a-4ca0-9540-0287230f10c7 | NULL                                 |       8 | 2016-05-03 06:10:45 |

+-----+---------+--------------------------------------+--------------------------------------+---------+---------------------+

4 rows in set (0.01 sec)

On the xenserver of old cluster, the volumes do not exist:


[root@abmpc-hv01 ~]# xe vdi-list name-label='ROOT-97'

[root@abmpc-hv01 ~]# xe vdi-list name-label='DATA-97'

[root@abmpc-hv01 ~]#

But the volumes are on the new cluster’s primary storage:


[root@abmpc-hv04 ~]# xe vdi-list name-label=ROOT-97

uuid ( RO)                : a253b217-8cdc-4d4a-a111-e5b6ad48a1d5

          name-label ( RW): ROOT-97

    name-description ( RW):

             sr-uuid ( RO): 6d4bea51-f253-3b43-2f2f-6d7ba3261ed3

        virtual-size ( RO): 34359738368

            sharable ( RO): false

           read-only ( RO): true


uuid ( RO)                : c46b7a61-9e82-4ea1-88ca-692cd4a9204b

          name-label ( RW): ROOT-97

    name-description ( RW):

             sr-uuid ( RO): 6d4bea51-f253-3b43-2f2f-6d7ba3261ed3

        virtual-size ( RO): 34359738368

            sharable ( RO): false

           read-only ( RO): false


[root@abmpc-hv04 ~]# xe vdi-list name-label=DATA-97

uuid ( RO)                : bc868e3d-b3c0-4c6a-a6fc-910bc4dd1722

          name-label ( RW): DATA-97

    name-description ( RW):

             sr-uuid ( RO): 6d4bea51-f253-3b43-2f2f-6d7ba3261ed3

        virtual-size ( RO): 107374182400

            sharable ( RO): false

           read-only ( RO): false


uuid ( RO)                : a8c187cc-2ba0-4928-8acf-2afc012c036c

          name-label ( RW): DATA-97

    name-description ( RW):

             sr-uuid ( RO): 6d4bea51-f253-3b43-2f2f-6d7ba3261ed3

        virtual-size ( RO): 107374182400

            sharable ( RO): false

           read-only ( RO): true


Following is how I plan to fix the corrupted DB entries. Note: using uuid of VDI volume with read/write access as the path values:


1. for ROOT-97 volume:

Update volumes set removed=NOW() where id=124;
Update volumes set removed=NULL where id=317;
Update volumes set path=c46b7a61-9e82-4ea1-88ca-692cd4a9204b where id=317;


2) for DATA-97 volume:

Update volumes set pool_id=8 where id=125;

Update volumes set path=bc868e3d-b3c0-4c6a-a6fc-910bc4dd1722 where id=125;


Would this work?


Thanks for all the helps anyone can provide.  I have a total of 4 VM instances with 8 volumes in this situation need to be fixed.


Yiping

Re: [Urgent]: corrupt DB after VM live migration with storage migration

Posted by Yiping Zhang <yz...@marketo.com>.
Thanks, it’s a good idea to back up those “removed” disks first before attempting DB surgery!




On 5/4/16, 9:57 PM, "ilya" <il...@gmail.com> wrote:

>never mind - on the "removed" disks - it deletes well.
>
>On 5/4/16 9:55 PM, ilya wrote:
>> I'm pretty certain cloudstack does not have purging on data disks as i
>> had to write my own :)
>> 
>> On 5/4/16 9:51 PM, Ahmad Emneina wrote:
>>> I'm not sure if the expunge interval/delay plays a part... but you might
>>> want to set: storage.cleanup.enabled to false. That might prevent your
>>> disks from being purged. You might also look to export those volumes, or
>>> copy them to a safe location, out of band.
>>>
>>> On Wed, May 4, 2016 at 8:49 PM, Yiping Zhang <yz...@marketo.com> wrote:
>>>
>>>> Before I try the direct DB modifications, I would first:
>>>>
>>>> * shutdown the VM instances
>>>> * stop cloudstack-management service
>>>> * do a DB backup with mysqldump
>>>>
>>>> What I worry the most is that the volumes on new cluster’s primary storage
>>>> device are marked as “removed”, so if I shutdown the instances, the
>>>> cloudstack may kick off a storage cleanup job to remove them from new
>>>> cluster’s primary storage  before I can get the fixes in.
>>>>
>>>> Is there a way to temporarily disable storage cleanups ?
>>>>
>>>> Yiping
>>>>
>>>>
>>>>
>>>>
>>>> On 5/4/16, 3:22 PM, "Yiping Zhang" <yz...@marketo.com> wrote:
>>>>
>>>>> Hi, all:
>>>>>
>>>>> I am in a situation that I need some help:
>>>>>
>>>>> I did a live migration with storage migration required for a production
>>>> VM instance from one cluster to another.  The first migration attempt
>>>> failed after some time, but the second attempt succeeded. During all this
>>>> time the VM instance is accessible (and it is still up and running).
>>>> However, when I use my api script to query volumes, it still reports that
>>>> the volume is on the old cluster’s primary storage.  If I shut down this
>>>> VM,  I am afraid that it won’t start again as it would try to use
>>>> non-existing volumes.
>>>>>
>>>>> Checking database, sure enough, the DB still has old info about these
>>>> volumes:
>>>>>
>>>>>
>>>>> mysql> select id,name from storage_pool where id=1 or id=8;
>>>>>
>>>>> +----+------------------+
>>>>>
>>>>> | id | name             |
>>>>>
>>>>> +----+------------------+
>>>>>
>>>>> |  1 | abprod-primary1  |
>>>>>
>>>>> |  8 | abprod-p1c2-pri1 |
>>>>>
>>>>> +----+------------------+
>>>>>
>>>>> 2 rows in set (0.01 sec)
>>>>>
>>>>>
>>>>> Here the old cluster’s primary storage has id=1, and the new cluster’s
>>>> primary storage has id=8.
>>>>>
>>>>>
>>>>> Here are the entries with wrong info in volumes table:
>>>>>
>>>>>
>>>>> mysql> select id,name, uuid, path,pool_id, removed from volumes where
>>>> name='ROOT-97' or name='DATA-97';
>>>>>
>>>>
>>>>> +-----+---------+--------------------------------------+--------------------------------------+---------+---------------------+
>>>>>
>>>>> | id  | name    | uuid                                 | path
>>>>                      | pool_id | removed             |
>>>>>
>>>>
>>>>> +-----+---------+--------------------------------------+--------------------------------------+---------+---------------------+
>>>>>
>>>>> | 124 | ROOT-97 | 224bf673-fda8-4ccc-9c30-fd1068aee005 |
>>>> 5d1ab4ef-2629-4384-a56a-e2dc1055d032 |       1 | NULL                |
>>>>>
>>>>> | 125 | DATA-97 | d385d635-9230-4130-8d1f-702dbcf0f22c |
>>>> 6b75496d-5907-46c3-8836-5618f11dac8e |       1 | NULL                |
>>>>>
>>>>> | 316 | ROOT-97 | 691b5c12-7ec4-408d-b66f-1ff041f149c1 | NULL
>>>>                      |       8 | 2016-05-03 06:10:40 |
>>>>>
>>>>> | 317 | ROOT-97 | 8ba29fcf-a81a-4ca0-9540-0287230f10c7 | NULL
>>>>                      |       8 | 2016-05-03 06:10:45 |
>>>>>
>>>>
>>>>> +-----+---------+--------------------------------------+--------------------------------------+---------+---------------------+
>>>>>
>>>>> 4 rows in set (0.01 sec)
>>>>>
>>>>> On the xenserver of old cluster, the volumes do not exist:
>>>>>
>>>>>
>>>>> [root@abmpc-hv01 ~]# xe vdi-list name-label='ROOT-97'
>>>>>
>>>>> [root@abmpc-hv01 ~]# xe vdi-list name-label='DATA-97'
>>>>>
>>>>> [root@abmpc-hv01 ~]#
>>>>>
>>>>> But the volumes are on the new cluster’s primary storage:
>>>>>
>>>>>
>>>>> [root@abmpc-hv04 ~]# xe vdi-list name-label=ROOT-97
>>>>>
>>>>> uuid ( RO)                : a253b217-8cdc-4d4a-a111-e5b6ad48a1d5
>>>>>
>>>>>          name-label ( RW): ROOT-97
>>>>>
>>>>>    name-description ( RW):
>>>>>
>>>>>             sr-uuid ( RO): 6d4bea51-f253-3b43-2f2f-6d7ba3261ed3
>>>>>
>>>>>        virtual-size ( RO): 34359738368
>>>>>
>>>>>            sharable ( RO): false
>>>>>
>>>>>           read-only ( RO): true
>>>>>
>>>>>
>>>>> uuid ( RO)                : c46b7a61-9e82-4ea1-88ca-692cd4a9204b
>>>>>
>>>>>          name-label ( RW): ROOT-97
>>>>>
>>>>>    name-description ( RW):
>>>>>
>>>>>             sr-uuid ( RO): 6d4bea51-f253-3b43-2f2f-6d7ba3261ed3
>>>>>
>>>>>        virtual-size ( RO): 34359738368
>>>>>
>>>>>            sharable ( RO): false
>>>>>
>>>>>           read-only ( RO): false
>>>>>
>>>>>
>>>>> [root@abmpc-hv04 ~]# xe vdi-list name-label=DATA-97
>>>>>
>>>>> uuid ( RO)                : bc868e3d-b3c0-4c6a-a6fc-910bc4dd1722
>>>>>
>>>>>          name-label ( RW): DATA-97
>>>>>
>>>>>    name-description ( RW):
>>>>>
>>>>>             sr-uuid ( RO): 6d4bea51-f253-3b43-2f2f-6d7ba3261ed3
>>>>>
>>>>>        virtual-size ( RO): 107374182400
>>>>>
>>>>>            sharable ( RO): false
>>>>>
>>>>>           read-only ( RO): false
>>>>>
>>>>>
>>>>> uuid ( RO)                : a8c187cc-2ba0-4928-8acf-2afc012c036c
>>>>>
>>>>>          name-label ( RW): DATA-97
>>>>>
>>>>>    name-description ( RW):
>>>>>
>>>>>             sr-uuid ( RO): 6d4bea51-f253-3b43-2f2f-6d7ba3261ed3
>>>>>
>>>>>        virtual-size ( RO): 107374182400
>>>>>
>>>>>            sharable ( RO): false
>>>>>
>>>>>           read-only ( RO): true
>>>>>
>>>>>
>>>>> Following is how I plan to fix the corrupted DB entries. Note: using uuid
>>>> of VDI volume with read/write access as the path values:
>>>>>
>>>>>
>>>>> 1. for ROOT-97 volume:
>>>>>
>>>>> Update volumes set removed=NOW() where id=124;
>>>>> Update volumes set removed=NULL where id=317;
>>>>> Update volumes set path=c46b7a61-9e82-4ea1-88ca-692cd4a9204b where id=317;
>>>>>
>>>>>
>>>>> 2) for DATA-97 volume:
>>>>>
>>>>> Update volumes set pool_id=8 where id=125;
>>>>>
>>>>> Update volumes set path=bc868e3d-b3c0-4c6a-a6fc-910bc4dd1722 where id=125;
>>>>>
>>>>>
>>>>> Would this work?
>>>>>
>>>>>
>>>>> Thanks for all the helps anyone can provide.  I have a total of 4 VM
>>>> instances with 8 volumes in this situation need to be fixed.
>>>>>
>>>>>
>>>>> Yiping
>>>>
>>>

Re: [Urgent]: corrupt DB after VM live migration with storage migration

Posted by ilya <il...@gmail.com>.
never mind - on the "removed" disks - it deletes well.

On 5/4/16 9:55 PM, ilya wrote:
> I'm pretty certain cloudstack does not have purging on data disks as i
> had to write my own :)
> 
> On 5/4/16 9:51 PM, Ahmad Emneina wrote:
>> I'm not sure if the expunge interval/delay plays a part... but you might
>> want to set: storage.cleanup.enabled to false. That might prevent your
>> disks from being purged. You might also look to export those volumes, or
>> copy them to a safe location, out of band.
>>
>> On Wed, May 4, 2016 at 8:49 PM, Yiping Zhang <yz...@marketo.com> wrote:
>>
>>> Before I try the direct DB modifications, I would first:
>>>
>>> * shutdown the VM instances
>>> * stop cloudstack-management service
>>> * do a DB backup with mysqldump
>>>
>>> What I worry the most is that the volumes on new cluster\u2019s primary storage
>>> device are marked as \u201cremoved\u201d, so if I shutdown the instances, the
>>> cloudstack may kick off a storage cleanup job to remove them from new
>>> cluster\u2019s primary storage  before I can get the fixes in.
>>>
>>> Is there a way to temporarily disable storage cleanups ?
>>>
>>> Yiping
>>>
>>>
>>>
>>>
>>> On 5/4/16, 3:22 PM, "Yiping Zhang" <yz...@marketo.com> wrote:
>>>
>>>> Hi, all:
>>>>
>>>> I am in a situation that I need some help:
>>>>
>>>> I did a live migration with storage migration required for a production
>>> VM instance from one cluster to another.  The first migration attempt
>>> failed after some time, but the second attempt succeeded. During all this
>>> time the VM instance is accessible (and it is still up and running).
>>> However, when I use my api script to query volumes, it still reports that
>>> the volume is on the old cluster\u2019s primary storage.  If I shut down this
>>> VM,  I am afraid that it won\u2019t start again as it would try to use
>>> non-existing volumes.
>>>>
>>>> Checking database, sure enough, the DB still has old info about these
>>> volumes:
>>>>
>>>>
>>>> mysql> select id,name from storage_pool where id=1 or id=8;
>>>>
>>>> +----+------------------+
>>>>
>>>> | id | name             |
>>>>
>>>> +----+------------------+
>>>>
>>>> |  1 | abprod-primary1  |
>>>>
>>>> |  8 | abprod-p1c2-pri1 |
>>>>
>>>> +----+------------------+
>>>>
>>>> 2 rows in set (0.01 sec)
>>>>
>>>>
>>>> Here the old cluster\u2019s primary storage has id=1, and the new cluster\u2019s
>>> primary storage has id=8.
>>>>
>>>>
>>>> Here are the entries with wrong info in volumes table:
>>>>
>>>>
>>>> mysql> select id,name, uuid, path,pool_id, removed from volumes where
>>> name='ROOT-97' or name='DATA-97';
>>>>
>>>
>>>> +-----+---------+--------------------------------------+--------------------------------------+---------+---------------------+
>>>>
>>>> | id  | name    | uuid                                 | path
>>>                      | pool_id | removed             |
>>>>
>>>
>>>> +-----+---------+--------------------------------------+--------------------------------------+---------+---------------------+
>>>>
>>>> | 124 | ROOT-97 | 224bf673-fda8-4ccc-9c30-fd1068aee005 |
>>> 5d1ab4ef-2629-4384-a56a-e2dc1055d032 |       1 | NULL                |
>>>>
>>>> | 125 | DATA-97 | d385d635-9230-4130-8d1f-702dbcf0f22c |
>>> 6b75496d-5907-46c3-8836-5618f11dac8e |       1 | NULL                |
>>>>
>>>> | 316 | ROOT-97 | 691b5c12-7ec4-408d-b66f-1ff041f149c1 | NULL
>>>                      |       8 | 2016-05-03 06:10:40 |
>>>>
>>>> | 317 | ROOT-97 | 8ba29fcf-a81a-4ca0-9540-0287230f10c7 | NULL
>>>                      |       8 | 2016-05-03 06:10:45 |
>>>>
>>>
>>>> +-----+---------+--------------------------------------+--------------------------------------+---------+---------------------+
>>>>
>>>> 4 rows in set (0.01 sec)
>>>>
>>>> On the xenserver of old cluster, the volumes do not exist:
>>>>
>>>>
>>>> [root@abmpc-hv01 ~]# xe vdi-list name-label='ROOT-97'
>>>>
>>>> [root@abmpc-hv01 ~]# xe vdi-list name-label='DATA-97'
>>>>
>>>> [root@abmpc-hv01 ~]#
>>>>
>>>> But the volumes are on the new cluster\u2019s primary storage:
>>>>
>>>>
>>>> [root@abmpc-hv04 ~]# xe vdi-list name-label=ROOT-97
>>>>
>>>> uuid ( RO)                : a253b217-8cdc-4d4a-a111-e5b6ad48a1d5
>>>>
>>>>          name-label ( RW): ROOT-97
>>>>
>>>>    name-description ( RW):
>>>>
>>>>             sr-uuid ( RO): 6d4bea51-f253-3b43-2f2f-6d7ba3261ed3
>>>>
>>>>        virtual-size ( RO): 34359738368
>>>>
>>>>            sharable ( RO): false
>>>>
>>>>           read-only ( RO): true
>>>>
>>>>
>>>> uuid ( RO)                : c46b7a61-9e82-4ea1-88ca-692cd4a9204b
>>>>
>>>>          name-label ( RW): ROOT-97
>>>>
>>>>    name-description ( RW):
>>>>
>>>>             sr-uuid ( RO): 6d4bea51-f253-3b43-2f2f-6d7ba3261ed3
>>>>
>>>>        virtual-size ( RO): 34359738368
>>>>
>>>>            sharable ( RO): false
>>>>
>>>>           read-only ( RO): false
>>>>
>>>>
>>>> [root@abmpc-hv04 ~]# xe vdi-list name-label=DATA-97
>>>>
>>>> uuid ( RO)                : bc868e3d-b3c0-4c6a-a6fc-910bc4dd1722
>>>>
>>>>          name-label ( RW): DATA-97
>>>>
>>>>    name-description ( RW):
>>>>
>>>>             sr-uuid ( RO): 6d4bea51-f253-3b43-2f2f-6d7ba3261ed3
>>>>
>>>>        virtual-size ( RO): 107374182400
>>>>
>>>>            sharable ( RO): false
>>>>
>>>>           read-only ( RO): false
>>>>
>>>>
>>>> uuid ( RO)                : a8c187cc-2ba0-4928-8acf-2afc012c036c
>>>>
>>>>          name-label ( RW): DATA-97
>>>>
>>>>    name-description ( RW):
>>>>
>>>>             sr-uuid ( RO): 6d4bea51-f253-3b43-2f2f-6d7ba3261ed3
>>>>
>>>>        virtual-size ( RO): 107374182400
>>>>
>>>>            sharable ( RO): false
>>>>
>>>>           read-only ( RO): true
>>>>
>>>>
>>>> Following is how I plan to fix the corrupted DB entries. Note: using uuid
>>> of VDI volume with read/write access as the path values:
>>>>
>>>>
>>>> 1. for ROOT-97 volume:
>>>>
>>>> Update volumes set removed=NOW() where id=124;
>>>> Update volumes set removed=NULL where id=317;
>>>> Update volumes set path=c46b7a61-9e82-4ea1-88ca-692cd4a9204b where id=317;
>>>>
>>>>
>>>> 2) for DATA-97 volume:
>>>>
>>>> Update volumes set pool_id=8 where id=125;
>>>>
>>>> Update volumes set path=bc868e3d-b3c0-4c6a-a6fc-910bc4dd1722 where id=125;
>>>>
>>>>
>>>> Would this work?
>>>>
>>>>
>>>> Thanks for all the helps anyone can provide.  I have a total of 4 VM
>>> instances with 8 volumes in this situation need to be fixed.
>>>>
>>>>
>>>> Yiping
>>>
>>

Re: [Urgent]: corrupt DB after VM live migration with storage migration

Posted by ilya <il...@gmail.com>.
I'm pretty certain cloudstack does not have purging on data disks as i
had to write my own :)

On 5/4/16 9:51 PM, Ahmad Emneina wrote:
> I'm not sure if the expunge interval/delay plays a part... but you might
> want to set: storage.cleanup.enabled to false. That might prevent your
> disks from being purged. You might also look to export those volumes, or
> copy them to a safe location, out of band.
> 
> On Wed, May 4, 2016 at 8:49 PM, Yiping Zhang <yz...@marketo.com> wrote:
> 
>> Before I try the direct DB modifications, I would first:
>>
>> * shutdown the VM instances
>> * stop cloudstack-management service
>> * do a DB backup with mysqldump
>>
>> What I worry the most is that the volumes on new cluster\u2019s primary storage
>> device are marked as \u201cremoved\u201d, so if I shutdown the instances, the
>> cloudstack may kick off a storage cleanup job to remove them from new
>> cluster\u2019s primary storage  before I can get the fixes in.
>>
>> Is there a way to temporarily disable storage cleanups ?
>>
>> Yiping
>>
>>
>>
>>
>> On 5/4/16, 3:22 PM, "Yiping Zhang" <yz...@marketo.com> wrote:
>>
>>> Hi, all:
>>>
>>> I am in a situation that I need some help:
>>>
>>> I did a live migration with storage migration required for a production
>> VM instance from one cluster to another.  The first migration attempt
>> failed after some time, but the second attempt succeeded. During all this
>> time the VM instance is accessible (and it is still up and running).
>> However, when I use my api script to query volumes, it still reports that
>> the volume is on the old cluster\u2019s primary storage.  If I shut down this
>> VM,  I am afraid that it won\u2019t start again as it would try to use
>> non-existing volumes.
>>>
>>> Checking database, sure enough, the DB still has old info about these
>> volumes:
>>>
>>>
>>> mysql> select id,name from storage_pool where id=1 or id=8;
>>>
>>> +----+------------------+
>>>
>>> | id | name             |
>>>
>>> +----+------------------+
>>>
>>> |  1 | abprod-primary1  |
>>>
>>> |  8 | abprod-p1c2-pri1 |
>>>
>>> +----+------------------+
>>>
>>> 2 rows in set (0.01 sec)
>>>
>>>
>>> Here the old cluster\u2019s primary storage has id=1, and the new cluster\u2019s
>> primary storage has id=8.
>>>
>>>
>>> Here are the entries with wrong info in volumes table:
>>>
>>>
>>> mysql> select id,name, uuid, path,pool_id, removed from volumes where
>> name='ROOT-97' or name='DATA-97';
>>>
>>
>>> +-----+---------+--------------------------------------+--------------------------------------+---------+---------------------+
>>>
>>> | id  | name    | uuid                                 | path
>>                      | pool_id | removed             |
>>>
>>
>>> +-----+---------+--------------------------------------+--------------------------------------+---------+---------------------+
>>>
>>> | 124 | ROOT-97 | 224bf673-fda8-4ccc-9c30-fd1068aee005 |
>> 5d1ab4ef-2629-4384-a56a-e2dc1055d032 |       1 | NULL                |
>>>
>>> | 125 | DATA-97 | d385d635-9230-4130-8d1f-702dbcf0f22c |
>> 6b75496d-5907-46c3-8836-5618f11dac8e |       1 | NULL                |
>>>
>>> | 316 | ROOT-97 | 691b5c12-7ec4-408d-b66f-1ff041f149c1 | NULL
>>                      |       8 | 2016-05-03 06:10:40 |
>>>
>>> | 317 | ROOT-97 | 8ba29fcf-a81a-4ca0-9540-0287230f10c7 | NULL
>>                      |       8 | 2016-05-03 06:10:45 |
>>>
>>
>>> +-----+---------+--------------------------------------+--------------------------------------+---------+---------------------+
>>>
>>> 4 rows in set (0.01 sec)
>>>
>>> On the xenserver of old cluster, the volumes do not exist:
>>>
>>>
>>> [root@abmpc-hv01 ~]# xe vdi-list name-label='ROOT-97'
>>>
>>> [root@abmpc-hv01 ~]# xe vdi-list name-label='DATA-97'
>>>
>>> [root@abmpc-hv01 ~]#
>>>
>>> But the volumes are on the new cluster\u2019s primary storage:
>>>
>>>
>>> [root@abmpc-hv04 ~]# xe vdi-list name-label=ROOT-97
>>>
>>> uuid ( RO)                : a253b217-8cdc-4d4a-a111-e5b6ad48a1d5
>>>
>>>          name-label ( RW): ROOT-97
>>>
>>>    name-description ( RW):
>>>
>>>             sr-uuid ( RO): 6d4bea51-f253-3b43-2f2f-6d7ba3261ed3
>>>
>>>        virtual-size ( RO): 34359738368
>>>
>>>            sharable ( RO): false
>>>
>>>           read-only ( RO): true
>>>
>>>
>>> uuid ( RO)                : c46b7a61-9e82-4ea1-88ca-692cd4a9204b
>>>
>>>          name-label ( RW): ROOT-97
>>>
>>>    name-description ( RW):
>>>
>>>             sr-uuid ( RO): 6d4bea51-f253-3b43-2f2f-6d7ba3261ed3
>>>
>>>        virtual-size ( RO): 34359738368
>>>
>>>            sharable ( RO): false
>>>
>>>           read-only ( RO): false
>>>
>>>
>>> [root@abmpc-hv04 ~]# xe vdi-list name-label=DATA-97
>>>
>>> uuid ( RO)                : bc868e3d-b3c0-4c6a-a6fc-910bc4dd1722
>>>
>>>          name-label ( RW): DATA-97
>>>
>>>    name-description ( RW):
>>>
>>>             sr-uuid ( RO): 6d4bea51-f253-3b43-2f2f-6d7ba3261ed3
>>>
>>>        virtual-size ( RO): 107374182400
>>>
>>>            sharable ( RO): false
>>>
>>>           read-only ( RO): false
>>>
>>>
>>> uuid ( RO)                : a8c187cc-2ba0-4928-8acf-2afc012c036c
>>>
>>>          name-label ( RW): DATA-97
>>>
>>>    name-description ( RW):
>>>
>>>             sr-uuid ( RO): 6d4bea51-f253-3b43-2f2f-6d7ba3261ed3
>>>
>>>        virtual-size ( RO): 107374182400
>>>
>>>            sharable ( RO): false
>>>
>>>           read-only ( RO): true
>>>
>>>
>>> Following is how I plan to fix the corrupted DB entries. Note: using uuid
>> of VDI volume with read/write access as the path values:
>>>
>>>
>>> 1. for ROOT-97 volume:
>>>
>>> Update volumes set removed=NOW() where id=124;
>>> Update volumes set removed=NULL where id=317;
>>> Update volumes set path=c46b7a61-9e82-4ea1-88ca-692cd4a9204b where id=317;
>>>
>>>
>>> 2) for DATA-97 volume:
>>>
>>> Update volumes set pool_id=8 where id=125;
>>>
>>> Update volumes set path=bc868e3d-b3c0-4c6a-a6fc-910bc4dd1722 where id=125;
>>>
>>>
>>> Would this work?
>>>
>>>
>>> Thanks for all the helps anyone can provide.  I have a total of 4 VM
>> instances with 8 volumes in this situation need to be fixed.
>>>
>>>
>>> Yiping
>>
> 

Re: [Urgent]: corrupt DB after VM live migration with storage migration

Posted by Ahmad Emneina <ae...@gmail.com>.
I'm not sure if the expunge interval/delay plays a part... but you might
want to set: storage.cleanup.enabled to false. That might prevent your
disks from being purged. You might also look to export those volumes, or
copy them to a safe location, out of band.

On Wed, May 4, 2016 at 8:49 PM, Yiping Zhang <yz...@marketo.com> wrote:

> Before I try the direct DB modifications, I would first:
>
> * shutdown the VM instances
> * stop cloudstack-management service
> * do a DB backup with mysqldump
>
> What I worry the most is that the volumes on new cluster’s primary storage
> device are marked as “removed”, so if I shutdown the instances, the
> cloudstack may kick off a storage cleanup job to remove them from new
> cluster’s primary storage  before I can get the fixes in.
>
> Is there a way to temporarily disable storage cleanups ?
>
> Yiping
>
>
>
>
> On 5/4/16, 3:22 PM, "Yiping Zhang" <yz...@marketo.com> wrote:
>
> >Hi, all:
> >
> >I am in a situation that I need some help:
> >
> >I did a live migration with storage migration required for a production
> VM instance from one cluster to another.  The first migration attempt
> failed after some time, but the second attempt succeeded. During all this
> time the VM instance is accessible (and it is still up and running).
> However, when I use my api script to query volumes, it still reports that
> the volume is on the old cluster’s primary storage.  If I shut down this
> VM,  I am afraid that it won’t start again as it would try to use
> non-existing volumes.
> >
> >Checking database, sure enough, the DB still has old info about these
> volumes:
> >
> >
> >mysql> select id,name from storage_pool where id=1 or id=8;
> >
> >+----+------------------+
> >
> >| id | name             |
> >
> >+----+------------------+
> >
> >|  1 | abprod-primary1  |
> >
> >|  8 | abprod-p1c2-pri1 |
> >
> >+----+------------------+
> >
> >2 rows in set (0.01 sec)
> >
> >
> >Here the old cluster’s primary storage has id=1, and the new cluster’s
> primary storage has id=8.
> >
> >
> >Here are the entries with wrong info in volumes table:
> >
> >
> >mysql> select id,name, uuid, path,pool_id, removed from volumes where
> name='ROOT-97' or name='DATA-97';
> >
>
> >+-----+---------+--------------------------------------+--------------------------------------+---------+---------------------+
> >
> >| id  | name    | uuid                                 | path
>                      | pool_id | removed             |
> >
>
> >+-----+---------+--------------------------------------+--------------------------------------+---------+---------------------+
> >
> >| 124 | ROOT-97 | 224bf673-fda8-4ccc-9c30-fd1068aee005 |
> 5d1ab4ef-2629-4384-a56a-e2dc1055d032 |       1 | NULL                |
> >
> >| 125 | DATA-97 | d385d635-9230-4130-8d1f-702dbcf0f22c |
> 6b75496d-5907-46c3-8836-5618f11dac8e |       1 | NULL                |
> >
> >| 316 | ROOT-97 | 691b5c12-7ec4-408d-b66f-1ff041f149c1 | NULL
>                      |       8 | 2016-05-03 06:10:40 |
> >
> >| 317 | ROOT-97 | 8ba29fcf-a81a-4ca0-9540-0287230f10c7 | NULL
>                      |       8 | 2016-05-03 06:10:45 |
> >
>
> >+-----+---------+--------------------------------------+--------------------------------------+---------+---------------------+
> >
> >4 rows in set (0.01 sec)
> >
> >On the xenserver of old cluster, the volumes do not exist:
> >
> >
> >[root@abmpc-hv01 ~]# xe vdi-list name-label='ROOT-97'
> >
> >[root@abmpc-hv01 ~]# xe vdi-list name-label='DATA-97'
> >
> >[root@abmpc-hv01 ~]#
> >
> >But the volumes are on the new cluster’s primary storage:
> >
> >
> >[root@abmpc-hv04 ~]# xe vdi-list name-label=ROOT-97
> >
> >uuid ( RO)                : a253b217-8cdc-4d4a-a111-e5b6ad48a1d5
> >
> >          name-label ( RW): ROOT-97
> >
> >    name-description ( RW):
> >
> >             sr-uuid ( RO): 6d4bea51-f253-3b43-2f2f-6d7ba3261ed3
> >
> >        virtual-size ( RO): 34359738368
> >
> >            sharable ( RO): false
> >
> >           read-only ( RO): true
> >
> >
> >uuid ( RO)                : c46b7a61-9e82-4ea1-88ca-692cd4a9204b
> >
> >          name-label ( RW): ROOT-97
> >
> >    name-description ( RW):
> >
> >             sr-uuid ( RO): 6d4bea51-f253-3b43-2f2f-6d7ba3261ed3
> >
> >        virtual-size ( RO): 34359738368
> >
> >            sharable ( RO): false
> >
> >           read-only ( RO): false
> >
> >
> >[root@abmpc-hv04 ~]# xe vdi-list name-label=DATA-97
> >
> >uuid ( RO)                : bc868e3d-b3c0-4c6a-a6fc-910bc4dd1722
> >
> >          name-label ( RW): DATA-97
> >
> >    name-description ( RW):
> >
> >             sr-uuid ( RO): 6d4bea51-f253-3b43-2f2f-6d7ba3261ed3
> >
> >        virtual-size ( RO): 107374182400
> >
> >            sharable ( RO): false
> >
> >           read-only ( RO): false
> >
> >
> >uuid ( RO)                : a8c187cc-2ba0-4928-8acf-2afc012c036c
> >
> >          name-label ( RW): DATA-97
> >
> >    name-description ( RW):
> >
> >             sr-uuid ( RO): 6d4bea51-f253-3b43-2f2f-6d7ba3261ed3
> >
> >        virtual-size ( RO): 107374182400
> >
> >            sharable ( RO): false
> >
> >           read-only ( RO): true
> >
> >
> >Following is how I plan to fix the corrupted DB entries. Note: using uuid
> of VDI volume with read/write access as the path values:
> >
> >
> >1. for ROOT-97 volume:
> >
> >Update volumes set removed=NOW() where id=124;
> >Update volumes set removed=NULL where id=317;
> >Update volumes set path=c46b7a61-9e82-4ea1-88ca-692cd4a9204b where id=317;
> >
> >
> >2) for DATA-97 volume:
> >
> >Update volumes set pool_id=8 where id=125;
> >
> >Update volumes set path=bc868e3d-b3c0-4c6a-a6fc-910bc4dd1722 where id=125;
> >
> >
> >Would this work?
> >
> >
> >Thanks for all the helps anyone can provide.  I have a total of 4 VM
> instances with 8 volumes in this situation need to be fixed.
> >
> >
> >Yiping
>

Re: [Urgent]: corrupt DB after VM live migration with storage migration

Posted by ilya <il...@gmail.com>.
Yiping,

We've dealt with many corruptions in past. It was more around VMware as
it would eat up disks time to time. Or someone would move the VM out of
bound by doing storage or cluster vmotion.

The solution you described should work.

However, for extra paranoid:

step 1, full db backup
step 2, backup the root and data disks as some other file name - just in
case

Then proceed with your proposed solution.

As long as you have proper backups, you should be ok. If VM start
failed, the logs will tell you where cloudstack expects for volume to
be, you can either move the volume there or update cloudstack volumes
table and point it to correct pool_id.

Regards
ilya


On 5/4/16 8:49 PM, Yiping Zhang wrote:
> Before I try the direct DB modifications, I would first:
> 
> * shutdown the VM instances
> * stop cloudstack-management service
> * do a DB backup with mysqldump
> 
> What I worry the most is that the volumes on new cluster\u2019s primary storage device are marked as \u201cremoved\u201d, so if I shutdown the instances, the cloudstack may kick off a storage cleanup job to remove them from new cluster\u2019s primary storage  before I can get the fixes in.
> 
> Is there a way to temporarily disable storage cleanups ?
> 
> Yiping
> 
> 
> 
> 
> On 5/4/16, 3:22 PM, "Yiping Zhang" <yz...@marketo.com> wrote:
> 
>> Hi, all:
>>
>> I am in a situation that I need some help:
>>
>> I did a live migration with storage migration required for a production VM instance from one cluster to another.  The first migration attempt failed after some time, but the second attempt succeeded. During all this time the VM instance is accessible (and it is still up and running).  However, when I use my api script to query volumes, it still reports that the volume is on the old cluster\u2019s primary storage.  If I shut down this VM,  I am afraid that it won\u2019t start again as it would try to use non-existing volumes.
>>
>> Checking database, sure enough, the DB still has old info about these volumes:
>>
>>
>> mysql> select id,name from storage_pool where id=1 or id=8;
>>
>> +----+------------------+
>>
>> | id | name             |
>>
>> +----+------------------+
>>
>> |  1 | abprod-primary1  |
>>
>> |  8 | abprod-p1c2-pri1 |
>>
>> +----+------------------+
>>
>> 2 rows in set (0.01 sec)
>>
>>
>> Here the old cluster\u2019s primary storage has id=1, and the new cluster\u2019s primary storage has id=8.
>>
>>
>> Here are the entries with wrong info in volumes table:
>>
>>
>> mysql> select id,name, uuid, path,pool_id, removed from volumes where name='ROOT-97' or name='DATA-97';
>>
>> +-----+---------+--------------------------------------+--------------------------------------+---------+---------------------+
>>
>> | id  | name    | uuid                                 | path                                 | pool_id | removed             |
>>
>> +-----+---------+--------------------------------------+--------------------------------------+---------+---------------------+
>>
>> | 124 | ROOT-97 | 224bf673-fda8-4ccc-9c30-fd1068aee005 | 5d1ab4ef-2629-4384-a56a-e2dc1055d032 |       1 | NULL                |
>>
>> | 125 | DATA-97 | d385d635-9230-4130-8d1f-702dbcf0f22c | 6b75496d-5907-46c3-8836-5618f11dac8e |       1 | NULL                |
>>
>> | 316 | ROOT-97 | 691b5c12-7ec4-408d-b66f-1ff041f149c1 | NULL                                 |       8 | 2016-05-03 06:10:40 |
>>
>> | 317 | ROOT-97 | 8ba29fcf-a81a-4ca0-9540-0287230f10c7 | NULL                                 |       8 | 2016-05-03 06:10:45 |
>>
>> +-----+---------+--------------------------------------+--------------------------------------+---------+---------------------+
>>
>> 4 rows in set (0.01 sec)
>>
>> On the xenserver of old cluster, the volumes do not exist:
>>
>>
>> [root@abmpc-hv01 ~]# xe vdi-list name-label='ROOT-97'
>>
>> [root@abmpc-hv01 ~]# xe vdi-list name-label='DATA-97'
>>
>> [root@abmpc-hv01 ~]#
>>
>> But the volumes are on the new cluster\u2019s primary storage:
>>
>>
>> [root@abmpc-hv04 ~]# xe vdi-list name-label=ROOT-97
>>
>> uuid ( RO)                : a253b217-8cdc-4d4a-a111-e5b6ad48a1d5
>>
>>          name-label ( RW): ROOT-97
>>
>>    name-description ( RW):
>>
>>             sr-uuid ( RO): 6d4bea51-f253-3b43-2f2f-6d7ba3261ed3
>>
>>        virtual-size ( RO): 34359738368
>>
>>            sharable ( RO): false
>>
>>           read-only ( RO): true
>>
>>
>> uuid ( RO)                : c46b7a61-9e82-4ea1-88ca-692cd4a9204b
>>
>>          name-label ( RW): ROOT-97
>>
>>    name-description ( RW):
>>
>>             sr-uuid ( RO): 6d4bea51-f253-3b43-2f2f-6d7ba3261ed3
>>
>>        virtual-size ( RO): 34359738368
>>
>>            sharable ( RO): false
>>
>>           read-only ( RO): false
>>
>>
>> [root@abmpc-hv04 ~]# xe vdi-list name-label=DATA-97
>>
>> uuid ( RO)                : bc868e3d-b3c0-4c6a-a6fc-910bc4dd1722
>>
>>          name-label ( RW): DATA-97
>>
>>    name-description ( RW):
>>
>>             sr-uuid ( RO): 6d4bea51-f253-3b43-2f2f-6d7ba3261ed3
>>
>>        virtual-size ( RO): 107374182400
>>
>>            sharable ( RO): false
>>
>>           read-only ( RO): false
>>
>>
>> uuid ( RO)                : a8c187cc-2ba0-4928-8acf-2afc012c036c
>>
>>          name-label ( RW): DATA-97
>>
>>    name-description ( RW):
>>
>>             sr-uuid ( RO): 6d4bea51-f253-3b43-2f2f-6d7ba3261ed3
>>
>>        virtual-size ( RO): 107374182400
>>
>>            sharable ( RO): false
>>
>>           read-only ( RO): true
>>
>>
>> Following is how I plan to fix the corrupted DB entries. Note: using uuid of VDI volume with read/write access as the path values:
>>
>>
>> 1. for ROOT-97 volume:
>>
>> Update volumes set removed=NOW() where id=124;
>> Update volumes set removed=NULL where id=317;
>> Update volumes set path=c46b7a61-9e82-4ea1-88ca-692cd4a9204b where id=317;
>>
>>
>> 2) for DATA-97 volume:
>>
>> Update volumes set pool_id=8 where id=125;
>>
>> Update volumes set path=bc868e3d-b3c0-4c6a-a6fc-910bc4dd1722 where id=125;
>>
>>
>> Would this work?
>>
>>
>> Thanks for all the helps anyone can provide.  I have a total of 4 VM instances with 8 volumes in this situation need to be fixed.
>>
>>
>> Yiping

Re: [Urgent]: corrupt DB after VM live migration with storage migration

Posted by Yiping Zhang <yz...@marketo.com>.
Before I try the direct DB modifications, I would first:

* shutdown the VM instances
* stop cloudstack-management service
* do a DB backup with mysqldump

What I worry the most is that the volumes on new cluster’s primary storage device are marked as “removed”, so if I shutdown the instances, the cloudstack may kick off a storage cleanup job to remove them from new cluster’s primary storage  before I can get the fixes in.

Is there a way to temporarily disable storage cleanups ?

Yiping




On 5/4/16, 3:22 PM, "Yiping Zhang" <yz...@marketo.com> wrote:

>Hi, all:
>
>I am in a situation that I need some help:
>
>I did a live migration with storage migration required for a production VM instance from one cluster to another.  The first migration attempt failed after some time, but the second attempt succeeded. During all this time the VM instance is accessible (and it is still up and running).  However, when I use my api script to query volumes, it still reports that the volume is on the old cluster’s primary storage.  If I shut down this VM,  I am afraid that it won’t start again as it would try to use non-existing volumes.
>
>Checking database, sure enough, the DB still has old info about these volumes:
>
>
>mysql> select id,name from storage_pool where id=1 or id=8;
>
>+----+------------------+
>
>| id | name             |
>
>+----+------------------+
>
>|  1 | abprod-primary1  |
>
>|  8 | abprod-p1c2-pri1 |
>
>+----+------------------+
>
>2 rows in set (0.01 sec)
>
>
>Here the old cluster’s primary storage has id=1, and the new cluster’s primary storage has id=8.
>
>
>Here are the entries with wrong info in volumes table:
>
>
>mysql> select id,name, uuid, path,pool_id, removed from volumes where name='ROOT-97' or name='DATA-97';
>
>+-----+---------+--------------------------------------+--------------------------------------+---------+---------------------+
>
>| id  | name    | uuid                                 | path                                 | pool_id | removed             |
>
>+-----+---------+--------------------------------------+--------------------------------------+---------+---------------------+
>
>| 124 | ROOT-97 | 224bf673-fda8-4ccc-9c30-fd1068aee005 | 5d1ab4ef-2629-4384-a56a-e2dc1055d032 |       1 | NULL                |
>
>| 125 | DATA-97 | d385d635-9230-4130-8d1f-702dbcf0f22c | 6b75496d-5907-46c3-8836-5618f11dac8e |       1 | NULL                |
>
>| 316 | ROOT-97 | 691b5c12-7ec4-408d-b66f-1ff041f149c1 | NULL                                 |       8 | 2016-05-03 06:10:40 |
>
>| 317 | ROOT-97 | 8ba29fcf-a81a-4ca0-9540-0287230f10c7 | NULL                                 |       8 | 2016-05-03 06:10:45 |
>
>+-----+---------+--------------------------------------+--------------------------------------+---------+---------------------+
>
>4 rows in set (0.01 sec)
>
>On the xenserver of old cluster, the volumes do not exist:
>
>
>[root@abmpc-hv01 ~]# xe vdi-list name-label='ROOT-97'
>
>[root@abmpc-hv01 ~]# xe vdi-list name-label='DATA-97'
>
>[root@abmpc-hv01 ~]#
>
>But the volumes are on the new cluster’s primary storage:
>
>
>[root@abmpc-hv04 ~]# xe vdi-list name-label=ROOT-97
>
>uuid ( RO)                : a253b217-8cdc-4d4a-a111-e5b6ad48a1d5
>
>          name-label ( RW): ROOT-97
>
>    name-description ( RW):
>
>             sr-uuid ( RO): 6d4bea51-f253-3b43-2f2f-6d7ba3261ed3
>
>        virtual-size ( RO): 34359738368
>
>            sharable ( RO): false
>
>           read-only ( RO): true
>
>
>uuid ( RO)                : c46b7a61-9e82-4ea1-88ca-692cd4a9204b
>
>          name-label ( RW): ROOT-97
>
>    name-description ( RW):
>
>             sr-uuid ( RO): 6d4bea51-f253-3b43-2f2f-6d7ba3261ed3
>
>        virtual-size ( RO): 34359738368
>
>            sharable ( RO): false
>
>           read-only ( RO): false
>
>
>[root@abmpc-hv04 ~]# xe vdi-list name-label=DATA-97
>
>uuid ( RO)                : bc868e3d-b3c0-4c6a-a6fc-910bc4dd1722
>
>          name-label ( RW): DATA-97
>
>    name-description ( RW):
>
>             sr-uuid ( RO): 6d4bea51-f253-3b43-2f2f-6d7ba3261ed3
>
>        virtual-size ( RO): 107374182400
>
>            sharable ( RO): false
>
>           read-only ( RO): false
>
>
>uuid ( RO)                : a8c187cc-2ba0-4928-8acf-2afc012c036c
>
>          name-label ( RW): DATA-97
>
>    name-description ( RW):
>
>             sr-uuid ( RO): 6d4bea51-f253-3b43-2f2f-6d7ba3261ed3
>
>        virtual-size ( RO): 107374182400
>
>            sharable ( RO): false
>
>           read-only ( RO): true
>
>
>Following is how I plan to fix the corrupted DB entries. Note: using uuid of VDI volume with read/write access as the path values:
>
>
>1. for ROOT-97 volume:
>
>Update volumes set removed=NOW() where id=124;
>Update volumes set removed=NULL where id=317;
>Update volumes set path=c46b7a61-9e82-4ea1-88ca-692cd4a9204b where id=317;
>
>
>2) for DATA-97 volume:
>
>Update volumes set pool_id=8 where id=125;
>
>Update volumes set path=bc868e3d-b3c0-4c6a-a6fc-910bc4dd1722 where id=125;
>
>
>Would this work?
>
>
>Thanks for all the helps anyone can provide.  I have a total of 4 VM instances with 8 volumes in this situation need to be fixed.
>
>
>Yiping