You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@cloudstack.apache.org by "Vincent Vuong (JIRA)" <ji...@apache.org> on 2014/11/02 07:28:33 UTC

[jira] [Created] (CLOUDSTACK-7827) storage migration timeout, loss of data

Vincent Vuong created CLOUDSTACK-7827:
-----------------------------------------

             Summary: storage migration timeout, loss of data
                 Key: CLOUDSTACK-7827
                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-7827
             Project: CloudStack
          Issue Type: Bug
      Security Level: Public (Anyone can view this level - this is the default.)
    Affects Versions: 4.4.1
         Environment: CentOS 6.5, Xenserver 6.2 with latest patches, Cloudstack 4.4.1
            Reporter: Vincent Vuong
            Priority: Critical


If a volume migration is not completed before the Cloudstack timeout is reached, the VM cannot be started after being stopped.  We have observed this behavior with Cloudstack 4.1 – 4.4.  Loss of data will occur if the admin stops the VM before finding the new VHD chain.  Here are the steps to reproduce:

1)	Execute a storage migration on a running VM that will exceed the Cloudstack timeout value.
2)	Storage migration will fail with Cloudstack reporting a “Host timed out” but Xenserver continues with the volume migration.
3)	After Xenserver completes the volume migration, Xenserver deletes the original VHD chain.  The database volume “PATH” in Cloudstack is not updated with the new VHD chain.
4)	VM cannot be started after being stopped.  There is no way to find out what the new VHD chain is if the VM has stopped.

Fix:
1)	While the VM is still running, run the following command to find the new VHD file name:  xe vbd-list vm-uuid=
2)	Stop the VM and copy the VHD chain back to the original primary storage and update the volume “PATH” with the new VHD chain in the Cloudstack database.
3)	Start the VM.

2014-11-01 21:16:56,887 DEBUG [o.a.c.s.m.AncientDataMotionStrategy] (Work-Job-Executor-3:ctx-80290066 job-174/job-175 ctx-c104adfc) copy failed
com.cloud.utils.exception.CloudRuntimeException: Failed to send command, due to Agent:4, com.cloud.exception.OperationTimedoutException: Commands 1959910262836298211 to Host 4 timed out after 3600
        at org.apache.cloudstack.storage.RemoteHostEndPoint.sendMessage(RemoteHostEndPoint.java:133)
        at org.apache.cloudstack.storage.motion.AncientDataMotionStrategy.migrateVolumeToPool(AncientDataMotionStrategy.java:383)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)