You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cloudstack.apache.org by Indra Pramana <in...@sg.or.id> on 2013/10/07 13:18:20 UTC

Running snapshot caused host to be disconnected

Dear all,

I did some tests on snapshots since it's now supported for my Ceph RBD
primary storage in CloudStack 4.2. When I ran the snapshot for a particular
VM instance earlier, I noticed that this has caused the host (where the VM
is on) becomes disconnected.

Here's the excerpt from the agent.log:

http://pastebin.com/dxVV7stu

The management-server.log doesn't much showing anything other than
detecting that the host was down and HA is being activated:

http://pastebin.com/UeLiSm9K

Anyone can advise what is causing the problem? So far there is only one
user doing the snapshotting and it has caused issues to the host, I can't
imagine what if multiple users try to do snapshotting at the same time?

I read about snapshot job throttling which is described on the manual:

http://cloudstack.apache.org/docs/en-US/Apache_CloudStack/4.2.0/html/Admin_Guide/working-with-snapshots.html

But I am not too sure whether this will help to resolve the problem since
there is only one user trying to perform snapshot and we already encounter
the problem already.

Anyone can advise how I can troubleshoot further and find a solution to the
problem?

Looking forward to your reply, thank you.

Cheers.

Re: Running snapshot caused host to be disconnected

Posted by Indra Pramana <in...@sg.or.id>.
Dear all,

I also found out that when the RBD snapshot is being run, the CPU
utilisation on the KVM host will be shooting up very high, which might
explain why the host becomes disconnected.

top - 22:49:32 up 3 days, 19:31,  1 user,  load average: 7.85, 4.97, 3.47
Tasks: 297 total,   3 running, 294 sleeping,   0 stopped,   0 zombie
Cpu(s):  4.5%us,  1.2%sy,  0.0%ni, 94.1%id,  0.1%wa,  0.0%hi,  0.0%si,
0.0%st
Mem:  264125244k total, 77203460k used, 186921784k free,   154888k buffers
Swap:   545788k total,        0k used,   545788k free, 60677092k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
18161 root      20   0 3871m  31m 8444 S  101  0.0 301:58.09 kvm
 2790 root      20   0 43.5g 1.6g  19m S   97  0.7  45:52.42 jsvc
24544 root      20   0 4583m  31m 8364 S   97  0.0 425:29.48 kvm
 6537 root      20   0     0    0    0 R   71  0.0   0:17.49 kworker/3:2
22546 root      20   0 6143m 2.0g 8452 S   26  0.8  55:14.07 kvm
 4219 root      20   0 7671m 4.0g 8524 S    6  1.6 106:12.26 kvm
 5989 root      20   0 43.2g 1.6g  232 D    6  0.6   0:08.13 jsvc
 5993 root      20   0 43.3g 1.6g  224 D    6  0.6   0:08.36 jsvc

Is it normal when snapshot is being run on the VM running on that host, the
host's CPU utilisation will be higher than usual? How can I limit the CPU
resources used by the snapshot?

Looking forward to your reply, thank you.

Cheers.



On Mon, Oct 7, 2013 at 7:18 PM, Indra Pramana <in...@sg.or.id> wrote:

> Dear all,
>
> I did some tests on snapshots since it's now supported for my Ceph RBD
> primary storage in CloudStack 4.2. When I ran the snapshot for a particular
> VM instance earlier, I noticed that this has caused the host (where the VM
> is on) becomes disconnected.
>
> Here's the excerpt from the agent.log:
>
> http://pastebin.com/dxVV7stu
>
> The management-server.log doesn't much showing anything other than
> detecting that the host was down and HA is being activated:
>
> http://pastebin.com/UeLiSm9K
>
> Anyone can advise what is causing the problem? So far there is only one
> user doing the snapshotting and it has caused issues to the host, I can't
> imagine what if multiple users try to do snapshotting at the same time?
>
> I read about snapshot job throttling which is described on the manual:
>
>
> http://cloudstack.apache.org/docs/en-US/Apache_CloudStack/4.2.0/html/Admin_Guide/working-with-snapshots.html
>
> But I am not too sure whether this will help to resolve the problem since
> there is only one user trying to perform snapshot and we already encounter
> the problem already.
>
> Anyone can advise how I can troubleshoot further and find a solution to
> the problem?
>
> Looking forward to your reply, thank you.
>
> Cheers.
>

Re: Running snapshot caused host to be disconnected

Posted by Indra Pramana <in...@sg.or.id>.
Dear all,

I also found out that when the RBD snapshot is being run, the CPU
utilisation on the KVM host will be shooting up very high, which might
explain why the host becomes disconnected.

top - 22:49:32 up 3 days, 19:31,  1 user,  load average: 7.85, 4.97, 3.47
Tasks: 297 total,   3 running, 294 sleeping,   0 stopped,   0 zombie
Cpu(s):  4.5%us,  1.2%sy,  0.0%ni, 94.1%id,  0.1%wa,  0.0%hi,  0.0%si,
0.0%st
Mem:  264125244k total, 77203460k used, 186921784k free,   154888k buffers
Swap:   545788k total,        0k used,   545788k free, 60677092k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
18161 root      20   0 3871m  31m 8444 S  101  0.0 301:58.09 kvm
 2790 root      20   0 43.5g 1.6g  19m S   97  0.7  45:52.42 jsvc
24544 root      20   0 4583m  31m 8364 S   97  0.0 425:29.48 kvm
 6537 root      20   0     0    0    0 R   71  0.0   0:17.49 kworker/3:2
22546 root      20   0 6143m 2.0g 8452 S   26  0.8  55:14.07 kvm
 4219 root      20   0 7671m 4.0g 8524 S    6  1.6 106:12.26 kvm
 5989 root      20   0 43.2g 1.6g  232 D    6  0.6   0:08.13 jsvc
 5993 root      20   0 43.3g 1.6g  224 D    6  0.6   0:08.36 jsvc

Is it normal when snapshot is being run on the VM running on that host, the
host's CPU utilisation will be higher than usual? How can I limit the CPU
resources used by the snapshot?

Looking forward to your reply, thank you.

Cheers.



On Mon, Oct 7, 2013 at 7:18 PM, Indra Pramana <in...@sg.or.id> wrote:

> Dear all,
>
> I did some tests on snapshots since it's now supported for my Ceph RBD
> primary storage in CloudStack 4.2. When I ran the snapshot for a particular
> VM instance earlier, I noticed that this has caused the host (where the VM
> is on) becomes disconnected.
>
> Here's the excerpt from the agent.log:
>
> http://pastebin.com/dxVV7stu
>
> The management-server.log doesn't much showing anything other than
> detecting that the host was down and HA is being activated:
>
> http://pastebin.com/UeLiSm9K
>
> Anyone can advise what is causing the problem? So far there is only one
> user doing the snapshotting and it has caused issues to the host, I can't
> imagine what if multiple users try to do snapshotting at the same time?
>
> I read about snapshot job throttling which is described on the manual:
>
>
> http://cloudstack.apache.org/docs/en-US/Apache_CloudStack/4.2.0/html/Admin_Guide/working-with-snapshots.html
>
> But I am not too sure whether this will help to resolve the problem since
> there is only one user trying to perform snapshot and we already encounter
> the problem already.
>
> Anyone can advise how I can troubleshoot further and find a solution to
> the problem?
>
> Looking forward to your reply, thank you.
>
> Cheers.
>