You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cloudstack.apache.org by Matthew Hartmann <mh...@tls.net> on 2012/12/03 18:08:08 UTC

XenServer & VM Snapshots

Hello! I'm hoping someone can help me troubleshoot the following issue:

 

I have a client who has a 960G data volume which contains their VM's
Exchange Data Store. When starting a snapshot, I found that a process is
started on one of my Compute Nodes titled "sparse_dd". I found that this
process is then sending the output of "sparse_dd" through another Compute
Node's xapi before placing it into the "snapshot store" on Secondary
Storage. It appears that this is part of the bottle neck as all of our
systems are connected via gigabit link and should not take 15+ hours to
create a snapshot. The following is the behavior that I have analyzed from
within my environment:

 

1)     Snapshot is started (either via Manual or Scheduled).

2)     Compute Node 1 "processes the snapshot" by exposing the VDI which
"sparse_dd" then creates a "thin provisioned" snapshot.

3)     The output of sparse_dd is delivered over HTTP to xapi on Compute
Node 2 where the Management Server mounted Secondary Storage.

4)     Compute Node 2 (receiving the snapshot via xapi) stores the snapshot
in the Secondary Storage mount point.

 

Based on the behavior, I have devise the following logic that I believe
CloudStack is utilizing:

 

1)     CloudStack creates a "snapshot VDI" via XenServer Pool Master's API.

2)     CloudStack finds a Compute Node that can mount Secondary Storage.

3)     CloudStack finds a Compute Node that can run "sparse_dd".

4)     CloudStack uses available Compute node to output the VDI to xapi on
the Compute Node that mounted Secondary Storage.

 

I must mention that the same Compute Node that ran sparse_dd or mounted
Secondary Storage is not always the same. It appears the Management Server
is simply round-robining through the list of Compute Nodes and using the
first one that is available.

 

Does anyone have any input on the issue I'm having or analysis of how
CloudStack/XenServer snapshots operate?

 

Thanks!

 

Cheers,

 

Matthew

 

 

 


Matthew Hartmann
Systems Administrator | V: 812.378.4100 x 850 | E: mhartmann@tls.net


 
<http://www.tls.net/?utm_campaign=signature&utm_source=home&utm_medium=email
> Description: TLS.NET, Inc. - http://www.tls.net


Description: C:\TLS.NET\signature\images\top.gif


	
 
<http://www.tls.net/enterprise_cloud/cloud.php?utm_campaign=signature&utm_so
urce=enterprise_cloud&utm_medium=email> Description: Enterprise Cloud

Description: C:\TLS.NET\signature\images\service_divider.gif

 
<http://www.tls.net/solutions/voip_services/hosted_pbx.php?utm_campaign=sign
ature&utm_source=voip_services&utm_medium=email> Description: VoIP

Description: C:\TLS.NET\signature\images\service_divider.gif

 
<http://www.tls.net/solutions/network_engineering.php?utm_campaign=signature
&utm_source=network_engineering&utm_medium=email> Description: Network
Engineering and Design

Description: C:\TLS.NET\signature\images\service_divider.gif

 
<http://www.tls.net/data_centers/data_centers.php?utm_campaign=signature&utm
_source=data_centers&utm_medium=email> Description: Data Center Services


Description: C:\TLS.NET\signature\images\bottom.gif

 

 


Re: XenServer & VM Snapshots

Posted by Nitin Mehta <Ni...@citrix.com>.
On 04-Dec-2012, at 12:44 PM, Marc Cirauqui wrote:

If I may, we've detected very poor performance executing snapshots.

We think it's due to XenServer's API, I don't know how and why, but the API
is very slow and runs one task at a time (if it is doing paralelization
it's almost nothing).

Do you know if there's a way to improve IO rates on XS side?

thx.


Marc - I think there has been some work from the CS side as well to better the performance a bit. I think following work has been done. You can tweak the parameter concurrent.snapshots.threshold.perhost  a bit to achieve some better performance based on the workload.
More info @ https://cwiki.apache.org/confluence/display/CLOUDSTACK/Snapshot+improvements+FS



On Mon, Dec 3, 2012 at 8:07 PM, Matthew Hartmann <mh...@tls.net>> wrote:

Thank you Anthony! :)

Cheers,

Matthew



Matthew Hartmann
Systems Administrator | V: 812.378.4100 x 850 | E: mhartmann@tls.net<ma...@tls.net>

TLS.NET<http://TLS.NET>, Inc.
http://www.tls.net


-----Original Message-----
From: Anthony Xu [mailto:Xuefei.Xu@citrix.com]
Sent: Monday, December 03, 2012 1:59 PM
To: 'Cloudstack Developers'; cloudstack-users@incubator.apache.org<ma...@incubator.apache.org>
Subject: RE: XenServer & VM Snapshots

CS 3.0.2 is too old version.

I'm pretty sure mount & copy on the same host in 3.0.4 and 3.0.5.
If mount & copy might be on different hosts, the issue is very likely to
happen.
I didn't hear this issue from QA and users.

I just checked vmopsSnapshot plug-in for XenServer, at /etc/xapi.d/plugins,
Which mounts secondary storage just before sparse-dd.

I recommend you to upgrade new version.

If you still see the issue,

Please post related management server log and /var/log/SMlog in XenServer.


Anthony











-----Original Message-----
From: Matthew Hartmann [mailto:mhartmann@tls.net]
Sent: Monday, December 03, 2012 10:31 AM
To: cloudstack-users@incubator.apache.org<ma...@incubator.apache.org>
Cc: 'Cloudstack Developers'
Subject: RE: XenServer & VM Snapshots

Anthony:

Thank you for the prompt and informative reply.

I'm pretty sure mount and copy are using the same XenServe host.

The behavior I have witnessed with CS 3.0.2 is that it doesn't always
do the
mount & copy on the same host. Out of the 12 tests I've performed, only
once
was the mount & copy performed on the same host that the VM was running
on.

I think the issue is the backup takes a long time because the data
volume
is big and network rate is low.
You can increase "BackupSnapshotWait" in global configuration table
to let
the backup operation finish.

I increased this in global settings from the default of 9 hours to 16
hours.
The snapshot still doesn't complete on time; it on average copies about
~460G before it times out. I'm pretty confident the network rate isn't
the
bottle neck as ISOs and imported VHDs install quickly. We have the
Secondary
Storage server set as the only internal site allowed to host files. I
upload
my ISO or VHD to Secondary Storage server and install using SSVM which
completes in a very timely manner. With a 1Gb network link, 1TB should
copy
in roughly 2 hours (if the link is saturated by the copy process); I've
only
found snapshotting (template creation appears to work flawlessly) to
take an
insanely long time to complete.

Is there anything else I can do to increase performance or logs I
should
check?

Cheers,

Matthew


Matthew Hartmann
Systems Administrator | V: 812.378.4100 x 850 | E: mhartmann@tls.net<ma...@tls.net>

TLS.NET<http://TLS.NET>, Inc.
http://www.tls.net


-----Original Message-----
From: Anthony Xu [mailto:Xuefei.Xu@citrix.com]
Sent: Monday, December 03, 2012 12:31 PM
To: Cloudstack Users
Cc: Cloudstack Developers
Subject: RE: XenServer & VM Snapshots

Hi Matthew,

You analysis is correct except following,

I must mention that the same Compute Node that ran sparse_dd or
mounted
Secondary Storage is not always the same. It appears the Management
Server
is simply round-robining through the list of >Compute Nodes and using
the
first one that is available.

I'm pretty sure mount and copy are using the same XenServe host.

I think the issue is the backup takes a long time because the data
volume is
big and network rate is low.
You can increase "BackupSnapshotWait" in global configuration table to
let
the backup operation finish.


Since CS takes the advantage of XenServer image format VHD, it uses VHD
to
do snapshot and clone, it requires snapshot to be backed up through
XenServer host.
The ideal solution for this issue might be leverage storage snapshot
and
clone functionality, Then snapshot back up is executed by storage host,
relieve some of the limitation.
Currently CS doesn't support this,  it is not hard to support this
after
Edison finishes storage frame change, it should be just another storage
plug-in.
When CS uses storage server snapshot and clone function, CS needs to
consider number of snapshot , number of volume limitation of storage
server.


Anthony














From: Matthew Hartmann [mailto:mhartmann@tls.net]
Sent: Monday, December 03, 2012 9:08 AM
To: Cloudstack Users
Cc: Cloudstack Developers
Subject: XenServer & VM Snapshots

Hello! I'm hoping someone can help me troubleshoot the following issue:

I have a client who has a 960G data volume which contains their VM's
Exchange Data Store. When starting a snapshot, I found that a process
is
started on one of my Compute Nodes titled "sparse_dd". I found that
this
process is then sending the output of "sparse_dd" through another
Compute
Node's xapi before placing it into the "snapshot store" on Secondary
Storage. It appears that this is part of the bottle neck as all of our
systems are connected via gigabit link and should not take 15+ hours to
create a snapshot. The following is the behavior that I have analyzed
from
within my environment:


1)     Snapshot is started (either via Manual or Scheduled).

2)     Compute Node 1 "processes the snapshot" by exposing the VDI
which
"sparse_dd" then creates a "thin provisioned" snapshot.

3)     The output of sparse_dd is delivered over HTTP to xapi on
Compute
Node 2 where the Management Server mounted Secondary Storage.

4)     Compute Node 2 (receiving the snapshot via xapi) stores the
snapshot
in the Secondary Storage mount point.

Based on the behavior, I have devise the following logic that I believe
CloudStack is utilizing:


1)     CloudStack creates a "snapshot VDI" via XenServer Pool Master's
API.

2)     CloudStack finds a Compute Node that can mount Secondary Storage.

3)     CloudStack finds a Compute Node that can run "sparse_dd".

4)     CloudStack uses available Compute node to output the VDI to xapi
on
the Compute Node that mounted Secondary Storage.

I must mention that the same Compute Node that ran sparse_dd or mounted
Secondary Storage is not always the same. It appears the Management
Server
is simply round-robining through the list of Compute Nodes and using
the
first one that is available.

Does anyone have any input on the issue I'm having or analysis of how
CloudStack/XenServer snapshots operate?

Thanks!

Cheers,

Matthew



Matthew Hartmann
Systems Administrator | V: 812.378.4100 x 850 | E: mhartmann@tls.net<ma...@tls.net>

[cid:image017.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/?utm_campaign=s
ignat
ure&utm_source=home&utm_medium=email>

[cid:image018.jpg@01CDD14E.DBAA2E70]


[cid:image019.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/enterprise_clou
d/clo
ud.php?utm_campaign=signature&utm_source=enterprise_cloud&utm_medium=em
ail>

[cid:image020.jpg@01CDD14E.DBAA2E70]

[cid:image021.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/solutions/voip_
servi
ces/hosted_pbx.php?utm_campaign=signature&utm_source=voip_services&utm_
mediu
m=email>

[cid:image020.jpg@01CDD14E.DBAA2E70]

[cid:image022.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/solutions/netwo
rk_en
gineering.php?utm_campaign=signature&utm_source=network_engineering&utm
_medi
um=email>

[cid:image020.jpg@01CDD14E.DBAA2E70]

[cid:image023.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/data_centers/da
ta_ce
nters.php?utm_campaign=signature&utm_source=data_centers&utm_medium=ema
il>











Re: XenServer & VM Snapshots

Posted by Nitin Mehta <Ni...@citrix.com>.
On 04-Dec-2012, at 12:44 PM, Marc Cirauqui wrote:

If I may, we've detected very poor performance executing snapshots.

We think it's due to XenServer's API, I don't know how and why, but the API
is very slow and runs one task at a time (if it is doing paralelization
it's almost nothing).

Do you know if there's a way to improve IO rates on XS side?

thx.


Marc - I think there has been some work from the CS side as well to better the performance a bit. I think following work has been done. You can tweak the parameter concurrent.snapshots.threshold.perhost  a bit to achieve some better performance based on the workload.
More info @ https://cwiki.apache.org/confluence/display/CLOUDSTACK/Snapshot+improvements+FS



On Mon, Dec 3, 2012 at 8:07 PM, Matthew Hartmann <mh...@tls.net>> wrote:

Thank you Anthony! :)

Cheers,

Matthew



Matthew Hartmann
Systems Administrator | V: 812.378.4100 x 850 | E: mhartmann@tls.net<ma...@tls.net>

TLS.NET<http://TLS.NET>, Inc.
http://www.tls.net


-----Original Message-----
From: Anthony Xu [mailto:Xuefei.Xu@citrix.com]
Sent: Monday, December 03, 2012 1:59 PM
To: 'Cloudstack Developers'; cloudstack-users@incubator.apache.org<ma...@incubator.apache.org>
Subject: RE: XenServer & VM Snapshots

CS 3.0.2 is too old version.

I'm pretty sure mount & copy on the same host in 3.0.4 and 3.0.5.
If mount & copy might be on different hosts, the issue is very likely to
happen.
I didn't hear this issue from QA and users.

I just checked vmopsSnapshot plug-in for XenServer, at /etc/xapi.d/plugins,
Which mounts secondary storage just before sparse-dd.

I recommend you to upgrade new version.

If you still see the issue,

Please post related management server log and /var/log/SMlog in XenServer.


Anthony











-----Original Message-----
From: Matthew Hartmann [mailto:mhartmann@tls.net]
Sent: Monday, December 03, 2012 10:31 AM
To: cloudstack-users@incubator.apache.org<ma...@incubator.apache.org>
Cc: 'Cloudstack Developers'
Subject: RE: XenServer & VM Snapshots

Anthony:

Thank you for the prompt and informative reply.

I'm pretty sure mount and copy are using the same XenServe host.

The behavior I have witnessed with CS 3.0.2 is that it doesn't always
do the
mount & copy on the same host. Out of the 12 tests I've performed, only
once
was the mount & copy performed on the same host that the VM was running
on.

I think the issue is the backup takes a long time because the data
volume
is big and network rate is low.
You can increase "BackupSnapshotWait" in global configuration table
to let
the backup operation finish.

I increased this in global settings from the default of 9 hours to 16
hours.
The snapshot still doesn't complete on time; it on average copies about
~460G before it times out. I'm pretty confident the network rate isn't
the
bottle neck as ISOs and imported VHDs install quickly. We have the
Secondary
Storage server set as the only internal site allowed to host files. I
upload
my ISO or VHD to Secondary Storage server and install using SSVM which
completes in a very timely manner. With a 1Gb network link, 1TB should
copy
in roughly 2 hours (if the link is saturated by the copy process); I've
only
found snapshotting (template creation appears to work flawlessly) to
take an
insanely long time to complete.

Is there anything else I can do to increase performance or logs I
should
check?

Cheers,

Matthew


Matthew Hartmann
Systems Administrator | V: 812.378.4100 x 850 | E: mhartmann@tls.net<ma...@tls.net>

TLS.NET<http://TLS.NET>, Inc.
http://www.tls.net


-----Original Message-----
From: Anthony Xu [mailto:Xuefei.Xu@citrix.com]
Sent: Monday, December 03, 2012 12:31 PM
To: Cloudstack Users
Cc: Cloudstack Developers
Subject: RE: XenServer & VM Snapshots

Hi Matthew,

You analysis is correct except following,

I must mention that the same Compute Node that ran sparse_dd or
mounted
Secondary Storage is not always the same. It appears the Management
Server
is simply round-robining through the list of >Compute Nodes and using
the
first one that is available.

I'm pretty sure mount and copy are using the same XenServe host.

I think the issue is the backup takes a long time because the data
volume is
big and network rate is low.
You can increase "BackupSnapshotWait" in global configuration table to
let
the backup operation finish.


Since CS takes the advantage of XenServer image format VHD, it uses VHD
to
do snapshot and clone, it requires snapshot to be backed up through
XenServer host.
The ideal solution for this issue might be leverage storage snapshot
and
clone functionality, Then snapshot back up is executed by storage host,
relieve some of the limitation.
Currently CS doesn't support this,  it is not hard to support this
after
Edison finishes storage frame change, it should be just another storage
plug-in.
When CS uses storage server snapshot and clone function, CS needs to
consider number of snapshot , number of volume limitation of storage
server.


Anthony














From: Matthew Hartmann [mailto:mhartmann@tls.net]
Sent: Monday, December 03, 2012 9:08 AM
To: Cloudstack Users
Cc: Cloudstack Developers
Subject: XenServer & VM Snapshots

Hello! I'm hoping someone can help me troubleshoot the following issue:

I have a client who has a 960G data volume which contains their VM's
Exchange Data Store. When starting a snapshot, I found that a process
is
started on one of my Compute Nodes titled "sparse_dd". I found that
this
process is then sending the output of "sparse_dd" through another
Compute
Node's xapi before placing it into the "snapshot store" on Secondary
Storage. It appears that this is part of the bottle neck as all of our
systems are connected via gigabit link and should not take 15+ hours to
create a snapshot. The following is the behavior that I have analyzed
from
within my environment:


1)     Snapshot is started (either via Manual or Scheduled).

2)     Compute Node 1 "processes the snapshot" by exposing the VDI
which
"sparse_dd" then creates a "thin provisioned" snapshot.

3)     The output of sparse_dd is delivered over HTTP to xapi on
Compute
Node 2 where the Management Server mounted Secondary Storage.

4)     Compute Node 2 (receiving the snapshot via xapi) stores the
snapshot
in the Secondary Storage mount point.

Based on the behavior, I have devise the following logic that I believe
CloudStack is utilizing:


1)     CloudStack creates a "snapshot VDI" via XenServer Pool Master's
API.

2)     CloudStack finds a Compute Node that can mount Secondary Storage.

3)     CloudStack finds a Compute Node that can run "sparse_dd".

4)     CloudStack uses available Compute node to output the VDI to xapi
on
the Compute Node that mounted Secondary Storage.

I must mention that the same Compute Node that ran sparse_dd or mounted
Secondary Storage is not always the same. It appears the Management
Server
is simply round-robining through the list of Compute Nodes and using
the
first one that is available.

Does anyone have any input on the issue I'm having or analysis of how
CloudStack/XenServer snapshots operate?

Thanks!

Cheers,

Matthew



Matthew Hartmann
Systems Administrator | V: 812.378.4100 x 850 | E: mhartmann@tls.net<ma...@tls.net>

[cid:image017.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/?utm_campaign=s
ignat
ure&utm_source=home&utm_medium=email>

[cid:image018.jpg@01CDD14E.DBAA2E70]


[cid:image019.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/enterprise_clou
d/clo
ud.php?utm_campaign=signature&utm_source=enterprise_cloud&utm_medium=em
ail>

[cid:image020.jpg@01CDD14E.DBAA2E70]

[cid:image021.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/solutions/voip_
servi
ces/hosted_pbx.php?utm_campaign=signature&utm_source=voip_services&utm_
mediu
m=email>

[cid:image020.jpg@01CDD14E.DBAA2E70]

[cid:image022.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/solutions/netwo
rk_en
gineering.php?utm_campaign=signature&utm_source=network_engineering&utm
_medi
um=email>

[cid:image020.jpg@01CDD14E.DBAA2E70]

[cid:image023.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/data_centers/da
ta_ce
nters.php?utm_campaign=signature&utm_source=data_centers&utm_medium=ema
il>











Re: XenServer & VM Snapshots

Posted by Marc Cirauqui <mc...@gmail.com>.
If I may, we've detected very poor performance executing snapshots.

We think it's due to XenServer's API, I don't know how and why, but the API
is very slow and runs one task at a time (if it is doing paralelization
it's almost nothing).

Do you know if there's a way to improve IO rates on XS side?

thx.



On Mon, Dec 3, 2012 at 8:07 PM, Matthew Hartmann <mh...@tls.net> wrote:

> Thank you Anthony! :)
>
> Cheers,
>
> Matthew
>
>
>
> Matthew Hartmann
> Systems Administrator | V: 812.378.4100 x 850 | E: mhartmann@tls.net
>
> TLS.NET, Inc.
> http://www.tls.net
>
>
> -----Original Message-----
> From: Anthony Xu [mailto:Xuefei.Xu@citrix.com]
> Sent: Monday, December 03, 2012 1:59 PM
> To: 'Cloudstack Developers'; cloudstack-users@incubator.apache.org
> Subject: RE: XenServer & VM Snapshots
>
> CS 3.0.2 is too old version.
>
> I'm pretty sure mount & copy on the same host in 3.0.4 and 3.0.5.
> If mount & copy might be on different hosts, the issue is very likely to
> happen.
> I didn't hear this issue from QA and users.
>
> I just checked vmopsSnapshot plug-in for XenServer, at /etc/xapi.d/plugins,
> Which mounts secondary storage just before sparse-dd.
>
> I recommend you to upgrade new version.
>
> If you still see the issue,
>
> Please post related management server log and /var/log/SMlog in XenServer.
>
>
> Anthony
>
>
>
>
>
>
>
>
>
>
>
> > -----Original Message-----
> > From: Matthew Hartmann [mailto:mhartmann@tls.net]
> > Sent: Monday, December 03, 2012 10:31 AM
> > To: cloudstack-users@incubator.apache.org
> > Cc: 'Cloudstack Developers'
> > Subject: RE: XenServer & VM Snapshots
> >
> > Anthony:
> >
> > Thank you for the prompt and informative reply.
> >
> > > I'm pretty sure mount and copy are using the same XenServe host.
> >
> > The behavior I have witnessed with CS 3.0.2 is that it doesn't always
> > do the
> > mount & copy on the same host. Out of the 12 tests I've performed, only
> > once
> > was the mount & copy performed on the same host that the VM was running
> > on.
> >
> > > I think the issue is the backup takes a long time because the data
> > volume
> > is big and network rate is low.
> > > You can increase "BackupSnapshotWait" in global configuration table
> > to let
> > the backup operation finish.
> >
> > I increased this in global settings from the default of 9 hours to 16
> > hours.
> > The snapshot still doesn't complete on time; it on average copies about
> > ~460G before it times out. I'm pretty confident the network rate isn't
> > the
> > bottle neck as ISOs and imported VHDs install quickly. We have the
> > Secondary
> > Storage server set as the only internal site allowed to host files. I
> > upload
> > my ISO or VHD to Secondary Storage server and install using SSVM which
> > completes in a very timely manner. With a 1Gb network link, 1TB should
> > copy
> > in roughly 2 hours (if the link is saturated by the copy process); I've
> > only
> > found snapshotting (template creation appears to work flawlessly) to
> > take an
> > insanely long time to complete.
> >
> > Is there anything else I can do to increase performance or logs I
> > should
> > check?
> >
> > Cheers,
> >
> > Matthew
> >
> >
> > Matthew Hartmann
> > Systems Administrator | V: 812.378.4100 x 850 | E: mhartmann@tls.net
> >
> > TLS.NET, Inc.
> > http://www.tls.net
> >
> >
> > -----Original Message-----
> > From: Anthony Xu [mailto:Xuefei.Xu@citrix.com]
> > Sent: Monday, December 03, 2012 12:31 PM
> > To: Cloudstack Users
> > Cc: Cloudstack Developers
> > Subject: RE: XenServer & VM Snapshots
> >
> > Hi Matthew,
> >
> > You analysis is correct except following,
> >
> > >I must mention that the same Compute Node that ran sparse_dd or
> > mounted
> > Secondary Storage is not always the same. It appears the Management
> > Server
> > is simply round-robining through the list of >Compute Nodes and using
> > the
> > first one that is available.
> >
> > I'm pretty sure mount and copy are using the same XenServe host.
> >
> > I think the issue is the backup takes a long time because the data
> > volume is
> > big and network rate is low.
> > You can increase "BackupSnapshotWait" in global configuration table to
> > let
> > the backup operation finish.
> >
> >
> > Since CS takes the advantage of XenServer image format VHD, it uses VHD
> > to
> > do snapshot and clone, it requires snapshot to be backed up through
> > XenServer host.
> > The ideal solution for this issue might be leverage storage snapshot
> > and
> > clone functionality, Then snapshot back up is executed by storage host,
> > relieve some of the limitation.
> > Currently CS doesn't support this,  it is not hard to support this
> > after
> > Edison finishes storage frame change, it should be just another storage
> > plug-in.
> > When CS uses storage server snapshot and clone function, CS needs to
> > consider number of snapshot , number of volume limitation of storage
> > server.
> >
> >
> > Anthony
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > From: Matthew Hartmann [mailto:mhartmann@tls.net]
> > Sent: Monday, December 03, 2012 9:08 AM
> > To: Cloudstack Users
> > Cc: Cloudstack Developers
> > Subject: XenServer & VM Snapshots
> >
> > Hello! I'm hoping someone can help me troubleshoot the following issue:
> >
> > I have a client who has a 960G data volume which contains their VM's
> > Exchange Data Store. When starting a snapshot, I found that a process
> > is
> > started on one of my Compute Nodes titled "sparse_dd". I found that
> > this
> > process is then sending the output of "sparse_dd" through another
> > Compute
> > Node's xapi before placing it into the "snapshot store" on Secondary
> > Storage. It appears that this is part of the bottle neck as all of our
> > systems are connected via gigabit link and should not take 15+ hours to
> > create a snapshot. The following is the behavior that I have analyzed
> > from
> > within my environment:
> >
> >
> > 1)     Snapshot is started (either via Manual or Scheduled).
> >
> > 2)     Compute Node 1 "processes the snapshot" by exposing the VDI
> > which
> > "sparse_dd" then creates a "thin provisioned" snapshot.
> >
> > 3)     The output of sparse_dd is delivered over HTTP to xapi on
> > Compute
> > Node 2 where the Management Server mounted Secondary Storage.
> >
> > 4)     Compute Node 2 (receiving the snapshot via xapi) stores the
> > snapshot
> > in the Secondary Storage mount point.
> >
> > Based on the behavior, I have devise the following logic that I believe
> > CloudStack is utilizing:
> >
> >
> > 1)     CloudStack creates a "snapshot VDI" via XenServer Pool Master's
> > API.
> >
> > 2)     CloudStack finds a Compute Node that can mount Secondary Storage.
> >
> > 3)     CloudStack finds a Compute Node that can run "sparse_dd".
> >
> > 4)     CloudStack uses available Compute node to output the VDI to xapi
> > on
> > the Compute Node that mounted Secondary Storage.
> >
> > I must mention that the same Compute Node that ran sparse_dd or mounted
> > Secondary Storage is not always the same. It appears the Management
> > Server
> > is simply round-robining through the list of Compute Nodes and using
> > the
> > first one that is available.
> >
> > Does anyone have any input on the issue I'm having or analysis of how
> > CloudStack/XenServer snapshots operate?
> >
> > Thanks!
> >
> > Cheers,
> >
> > Matthew
> >
> >
> >
> > Matthew Hartmann
> > Systems Administrator | V: 812.378.4100 x 850 | E: mhartmann@tls.net
> >
> > [cid:image017.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/?utm_campaign=s
> > ignat
> > ure&utm_source=home&utm_medium=email>
> >
> > [cid:image018.jpg@01CDD14E.DBAA2E70]
> >
> >
> > [cid:image019.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/enterprise_clou
> > d/clo
> > ud.php?utm_campaign=signature&utm_source=enterprise_cloud&utm_medium=em
> > ail>
> >
> > [cid:image020.jpg@01CDD14E.DBAA2E70]
> >
> > [cid:image021.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/solutions/voip_
> > servi
> > ces/hosted_pbx.php?utm_campaign=signature&utm_source=voip_services&utm_
> > mediu
> > m=email>
> >
> > [cid:image020.jpg@01CDD14E.DBAA2E70]
> >
> > [cid:image022.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/solutions/netwo
> > rk_en
> > gineering.php?utm_campaign=signature&utm_source=network_engineering&utm
> > _medi
> > um=email>
> >
> > [cid:image020.jpg@01CDD14E.DBAA2E70]
> >
> > [cid:image023.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/data_centers/da
> > ta_ce
> > nters.php?utm_campaign=signature&utm_source=data_centers&utm_medium=ema
> > il>
> >
> >
> >
> >
> >
> >
>
>
>

Re: XenServer & VM Snapshots

Posted by Marc Cirauqui <mc...@gmail.com>.
If I may, we've detected very poor performance executing snapshots.

We think it's due to XenServer's API, I don't know how and why, but the API
is very slow and runs one task at a time (if it is doing paralelization
it's almost nothing).

Do you know if there's a way to improve IO rates on XS side?

thx.



On Mon, Dec 3, 2012 at 8:07 PM, Matthew Hartmann <mh...@tls.net> wrote:

> Thank you Anthony! :)
>
> Cheers,
>
> Matthew
>
>
>
> Matthew Hartmann
> Systems Administrator | V: 812.378.4100 x 850 | E: mhartmann@tls.net
>
> TLS.NET, Inc.
> http://www.tls.net
>
>
> -----Original Message-----
> From: Anthony Xu [mailto:Xuefei.Xu@citrix.com]
> Sent: Monday, December 03, 2012 1:59 PM
> To: 'Cloudstack Developers'; cloudstack-users@incubator.apache.org
> Subject: RE: XenServer & VM Snapshots
>
> CS 3.0.2 is too old version.
>
> I'm pretty sure mount & copy on the same host in 3.0.4 and 3.0.5.
> If mount & copy might be on different hosts, the issue is very likely to
> happen.
> I didn't hear this issue from QA and users.
>
> I just checked vmopsSnapshot plug-in for XenServer, at /etc/xapi.d/plugins,
> Which mounts secondary storage just before sparse-dd.
>
> I recommend you to upgrade new version.
>
> If you still see the issue,
>
> Please post related management server log and /var/log/SMlog in XenServer.
>
>
> Anthony
>
>
>
>
>
>
>
>
>
>
>
> > -----Original Message-----
> > From: Matthew Hartmann [mailto:mhartmann@tls.net]
> > Sent: Monday, December 03, 2012 10:31 AM
> > To: cloudstack-users@incubator.apache.org
> > Cc: 'Cloudstack Developers'
> > Subject: RE: XenServer & VM Snapshots
> >
> > Anthony:
> >
> > Thank you for the prompt and informative reply.
> >
> > > I'm pretty sure mount and copy are using the same XenServe host.
> >
> > The behavior I have witnessed with CS 3.0.2 is that it doesn't always
> > do the
> > mount & copy on the same host. Out of the 12 tests I've performed, only
> > once
> > was the mount & copy performed on the same host that the VM was running
> > on.
> >
> > > I think the issue is the backup takes a long time because the data
> > volume
> > is big and network rate is low.
> > > You can increase "BackupSnapshotWait" in global configuration table
> > to let
> > the backup operation finish.
> >
> > I increased this in global settings from the default of 9 hours to 16
> > hours.
> > The snapshot still doesn't complete on time; it on average copies about
> > ~460G before it times out. I'm pretty confident the network rate isn't
> > the
> > bottle neck as ISOs and imported VHDs install quickly. We have the
> > Secondary
> > Storage server set as the only internal site allowed to host files. I
> > upload
> > my ISO or VHD to Secondary Storage server and install using SSVM which
> > completes in a very timely manner. With a 1Gb network link, 1TB should
> > copy
> > in roughly 2 hours (if the link is saturated by the copy process); I've
> > only
> > found snapshotting (template creation appears to work flawlessly) to
> > take an
> > insanely long time to complete.
> >
> > Is there anything else I can do to increase performance or logs I
> > should
> > check?
> >
> > Cheers,
> >
> > Matthew
> >
> >
> > Matthew Hartmann
> > Systems Administrator | V: 812.378.4100 x 850 | E: mhartmann@tls.net
> >
> > TLS.NET, Inc.
> > http://www.tls.net
> >
> >
> > -----Original Message-----
> > From: Anthony Xu [mailto:Xuefei.Xu@citrix.com]
> > Sent: Monday, December 03, 2012 12:31 PM
> > To: Cloudstack Users
> > Cc: Cloudstack Developers
> > Subject: RE: XenServer & VM Snapshots
> >
> > Hi Matthew,
> >
> > You analysis is correct except following,
> >
> > >I must mention that the same Compute Node that ran sparse_dd or
> > mounted
> > Secondary Storage is not always the same. It appears the Management
> > Server
> > is simply round-robining through the list of >Compute Nodes and using
> > the
> > first one that is available.
> >
> > I'm pretty sure mount and copy are using the same XenServe host.
> >
> > I think the issue is the backup takes a long time because the data
> > volume is
> > big and network rate is low.
> > You can increase "BackupSnapshotWait" in global configuration table to
> > let
> > the backup operation finish.
> >
> >
> > Since CS takes the advantage of XenServer image format VHD, it uses VHD
> > to
> > do snapshot and clone, it requires snapshot to be backed up through
> > XenServer host.
> > The ideal solution for this issue might be leverage storage snapshot
> > and
> > clone functionality, Then snapshot back up is executed by storage host,
> > relieve some of the limitation.
> > Currently CS doesn't support this,  it is not hard to support this
> > after
> > Edison finishes storage frame change, it should be just another storage
> > plug-in.
> > When CS uses storage server snapshot and clone function, CS needs to
> > consider number of snapshot , number of volume limitation of storage
> > server.
> >
> >
> > Anthony
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > From: Matthew Hartmann [mailto:mhartmann@tls.net]
> > Sent: Monday, December 03, 2012 9:08 AM
> > To: Cloudstack Users
> > Cc: Cloudstack Developers
> > Subject: XenServer & VM Snapshots
> >
> > Hello! I'm hoping someone can help me troubleshoot the following issue:
> >
> > I have a client who has a 960G data volume which contains their VM's
> > Exchange Data Store. When starting a snapshot, I found that a process
> > is
> > started on one of my Compute Nodes titled "sparse_dd". I found that
> > this
> > process is then sending the output of "sparse_dd" through another
> > Compute
> > Node's xapi before placing it into the "snapshot store" on Secondary
> > Storage. It appears that this is part of the bottle neck as all of our
> > systems are connected via gigabit link and should not take 15+ hours to
> > create a snapshot. The following is the behavior that I have analyzed
> > from
> > within my environment:
> >
> >
> > 1)     Snapshot is started (either via Manual or Scheduled).
> >
> > 2)     Compute Node 1 "processes the snapshot" by exposing the VDI
> > which
> > "sparse_dd" then creates a "thin provisioned" snapshot.
> >
> > 3)     The output of sparse_dd is delivered over HTTP to xapi on
> > Compute
> > Node 2 where the Management Server mounted Secondary Storage.
> >
> > 4)     Compute Node 2 (receiving the snapshot via xapi) stores the
> > snapshot
> > in the Secondary Storage mount point.
> >
> > Based on the behavior, I have devise the following logic that I believe
> > CloudStack is utilizing:
> >
> >
> > 1)     CloudStack creates a "snapshot VDI" via XenServer Pool Master's
> > API.
> >
> > 2)     CloudStack finds a Compute Node that can mount Secondary Storage.
> >
> > 3)     CloudStack finds a Compute Node that can run "sparse_dd".
> >
> > 4)     CloudStack uses available Compute node to output the VDI to xapi
> > on
> > the Compute Node that mounted Secondary Storage.
> >
> > I must mention that the same Compute Node that ran sparse_dd or mounted
> > Secondary Storage is not always the same. It appears the Management
> > Server
> > is simply round-robining through the list of Compute Nodes and using
> > the
> > first one that is available.
> >
> > Does anyone have any input on the issue I'm having or analysis of how
> > CloudStack/XenServer snapshots operate?
> >
> > Thanks!
> >
> > Cheers,
> >
> > Matthew
> >
> >
> >
> > Matthew Hartmann
> > Systems Administrator | V: 812.378.4100 x 850 | E: mhartmann@tls.net
> >
> > [cid:image017.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/?utm_campaign=s
> > ignat
> > ure&utm_source=home&utm_medium=email>
> >
> > [cid:image018.jpg@01CDD14E.DBAA2E70]
> >
> >
> > [cid:image019.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/enterprise_clou
> > d/clo
> > ud.php?utm_campaign=signature&utm_source=enterprise_cloud&utm_medium=em
> > ail>
> >
> > [cid:image020.jpg@01CDD14E.DBAA2E70]
> >
> > [cid:image021.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/solutions/voip_
> > servi
> > ces/hosted_pbx.php?utm_campaign=signature&utm_source=voip_services&utm_
> > mediu
> > m=email>
> >
> > [cid:image020.jpg@01CDD14E.DBAA2E70]
> >
> > [cid:image022.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/solutions/netwo
> > rk_en
> > gineering.php?utm_campaign=signature&utm_source=network_engineering&utm
> > _medi
> > um=email>
> >
> > [cid:image020.jpg@01CDD14E.DBAA2E70]
> >
> > [cid:image023.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/data_centers/da
> > ta_ce
> > nters.php?utm_campaign=signature&utm_source=data_centers&utm_medium=ema
> > il>
> >
> >
> >
> >
> >
> >
>
>
>

RE: XenServer & VM Snapshots

Posted by Matthew Hartmann <mh...@tls.net>.
Thank you Anthony! :)

Cheers,

Matthew



Matthew Hartmann
Systems Administrator | V: 812.378.4100 x 850 | E: mhartmann@tls.net

TLS.NET, Inc.
http://www.tls.net


-----Original Message-----
From: Anthony Xu [mailto:Xuefei.Xu@citrix.com] 
Sent: Monday, December 03, 2012 1:59 PM
To: 'Cloudstack Developers'; cloudstack-users@incubator.apache.org
Subject: RE: XenServer & VM Snapshots

CS 3.0.2 is too old version. 

I'm pretty sure mount & copy on the same host in 3.0.4 and 3.0.5.
If mount & copy might be on different hosts, the issue is very likely to
happen.
I didn't hear this issue from QA and users.

I just checked vmopsSnapshot plug-in for XenServer, at /etc/xapi.d/plugins, 
Which mounts secondary storage just before sparse-dd.

I recommend you to upgrade new version.

If you still see the issue,

Please post related management server log and /var/log/SMlog in XenServer.


Anthony











> -----Original Message-----
> From: Matthew Hartmann [mailto:mhartmann@tls.net]
> Sent: Monday, December 03, 2012 10:31 AM
> To: cloudstack-users@incubator.apache.org
> Cc: 'Cloudstack Developers'
> Subject: RE: XenServer & VM Snapshots
> 
> Anthony:
> 
> Thank you for the prompt and informative reply.
> 
> > I'm pretty sure mount and copy are using the same XenServe host.
> 
> The behavior I have witnessed with CS 3.0.2 is that it doesn't always
> do the
> mount & copy on the same host. Out of the 12 tests I've performed, only
> once
> was the mount & copy performed on the same host that the VM was running
> on.
> 
> > I think the issue is the backup takes a long time because the data
> volume
> is big and network rate is low.
> > You can increase "BackupSnapshotWait" in global configuration table
> to let
> the backup operation finish.
> 
> I increased this in global settings from the default of 9 hours to 16
> hours.
> The snapshot still doesn't complete on time; it on average copies about
> ~460G before it times out. I'm pretty confident the network rate isn't
> the
> bottle neck as ISOs and imported VHDs install quickly. We have the
> Secondary
> Storage server set as the only internal site allowed to host files. I
> upload
> my ISO or VHD to Secondary Storage server and install using SSVM which
> completes in a very timely manner. With a 1Gb network link, 1TB should
> copy
> in roughly 2 hours (if the link is saturated by the copy process); I've
> only
> found snapshotting (template creation appears to work flawlessly) to
> take an
> insanely long time to complete.
> 
> Is there anything else I can do to increase performance or logs I
> should
> check?
> 
> Cheers,
> 
> Matthew
> 
> 
> Matthew Hartmann
> Systems Administrator | V: 812.378.4100 x 850 | E: mhartmann@tls.net
> 
> TLS.NET, Inc.
> http://www.tls.net
> 
> 
> -----Original Message-----
> From: Anthony Xu [mailto:Xuefei.Xu@citrix.com]
> Sent: Monday, December 03, 2012 12:31 PM
> To: Cloudstack Users
> Cc: Cloudstack Developers
> Subject: RE: XenServer & VM Snapshots
> 
> Hi Matthew,
> 
> You analysis is correct except following,
> 
> >I must mention that the same Compute Node that ran sparse_dd or
> mounted
> Secondary Storage is not always the same. It appears the Management
> Server
> is simply round-robining through the list of >Compute Nodes and using
> the
> first one that is available.
> 
> I'm pretty sure mount and copy are using the same XenServe host.
> 
> I think the issue is the backup takes a long time because the data
> volume is
> big and network rate is low.
> You can increase "BackupSnapshotWait" in global configuration table to
> let
> the backup operation finish.
> 
> 
> Since CS takes the advantage of XenServer image format VHD, it uses VHD
> to
> do snapshot and clone, it requires snapshot to be backed up through
> XenServer host.
> The ideal solution for this issue might be leverage storage snapshot
> and
> clone functionality, Then snapshot back up is executed by storage host,
> relieve some of the limitation.
> Currently CS doesn't support this,  it is not hard to support this
> after
> Edison finishes storage frame change, it should be just another storage
> plug-in.
> When CS uses storage server snapshot and clone function, CS needs to
> consider number of snapshot , number of volume limitation of storage
> server.
> 
> 
> Anthony
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> From: Matthew Hartmann [mailto:mhartmann@tls.net]
> Sent: Monday, December 03, 2012 9:08 AM
> To: Cloudstack Users
> Cc: Cloudstack Developers
> Subject: XenServer & VM Snapshots
> 
> Hello! I'm hoping someone can help me troubleshoot the following issue:
> 
> I have a client who has a 960G data volume which contains their VM's
> Exchange Data Store. When starting a snapshot, I found that a process
> is
> started on one of my Compute Nodes titled "sparse_dd". I found that
> this
> process is then sending the output of "sparse_dd" through another
> Compute
> Node's xapi before placing it into the "snapshot store" on Secondary
> Storage. It appears that this is part of the bottle neck as all of our
> systems are connected via gigabit link and should not take 15+ hours to
> create a snapshot. The following is the behavior that I have analyzed
> from
> within my environment:
> 
> 
> 1)     Snapshot is started (either via Manual or Scheduled).
> 
> 2)     Compute Node 1 "processes the snapshot" by exposing the VDI
> which
> "sparse_dd" then creates a "thin provisioned" snapshot.
> 
> 3)     The output of sparse_dd is delivered over HTTP to xapi on
> Compute
> Node 2 where the Management Server mounted Secondary Storage.
> 
> 4)     Compute Node 2 (receiving the snapshot via xapi) stores the
> snapshot
> in the Secondary Storage mount point.
> 
> Based on the behavior, I have devise the following logic that I believe
> CloudStack is utilizing:
> 
> 
> 1)     CloudStack creates a "snapshot VDI" via XenServer Pool Master's
> API.
> 
> 2)     CloudStack finds a Compute Node that can mount Secondary Storage.
> 
> 3)     CloudStack finds a Compute Node that can run "sparse_dd".
> 
> 4)     CloudStack uses available Compute node to output the VDI to xapi
> on
> the Compute Node that mounted Secondary Storage.
> 
> I must mention that the same Compute Node that ran sparse_dd or mounted
> Secondary Storage is not always the same. It appears the Management
> Server
> is simply round-robining through the list of Compute Nodes and using
> the
> first one that is available.
> 
> Does anyone have any input on the issue I'm having or analysis of how
> CloudStack/XenServer snapshots operate?
> 
> Thanks!
> 
> Cheers,
> 
> Matthew
> 
> 
> 
> Matthew Hartmann
> Systems Administrator | V: 812.378.4100 x 850 | E: mhartmann@tls.net
> 
> [cid:image017.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/?utm_campaign=s
> ignat
> ure&utm_source=home&utm_medium=email>
> 
> [cid:image018.jpg@01CDD14E.DBAA2E70]
> 
> 
> [cid:image019.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/enterprise_clou
> d/clo
> ud.php?utm_campaign=signature&utm_source=enterprise_cloud&utm_medium=em
> ail>
> 
> [cid:image020.jpg@01CDD14E.DBAA2E70]
> 
> [cid:image021.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/solutions/voip_
> servi
> ces/hosted_pbx.php?utm_campaign=signature&utm_source=voip_services&utm_
> mediu
> m=email>
> 
> [cid:image020.jpg@01CDD14E.DBAA2E70]
> 
> [cid:image022.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/solutions/netwo
> rk_en
> gineering.php?utm_campaign=signature&utm_source=network_engineering&utm
> _medi
> um=email>
> 
> [cid:image020.jpg@01CDD14E.DBAA2E70]
> 
> [cid:image023.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/data_centers/da
> ta_ce
> nters.php?utm_campaign=signature&utm_source=data_centers&utm_medium=ema
> il>
> 
> 
> 
> 
> 
> 



RE: XenServer & VM Snapshots

Posted by Matthew Hartmann <mh...@tls.net>.
Thank you Anthony! :)

Cheers,

Matthew



Matthew Hartmann
Systems Administrator | V: 812.378.4100 x 850 | E: mhartmann@tls.net

TLS.NET, Inc.
http://www.tls.net


-----Original Message-----
From: Anthony Xu [mailto:Xuefei.Xu@citrix.com] 
Sent: Monday, December 03, 2012 1:59 PM
To: 'Cloudstack Developers'; cloudstack-users@incubator.apache.org
Subject: RE: XenServer & VM Snapshots

CS 3.0.2 is too old version. 

I'm pretty sure mount & copy on the same host in 3.0.4 and 3.0.5.
If mount & copy might be on different hosts, the issue is very likely to
happen.
I didn't hear this issue from QA and users.

I just checked vmopsSnapshot plug-in for XenServer, at /etc/xapi.d/plugins, 
Which mounts secondary storage just before sparse-dd.

I recommend you to upgrade new version.

If you still see the issue,

Please post related management server log and /var/log/SMlog in XenServer.


Anthony











> -----Original Message-----
> From: Matthew Hartmann [mailto:mhartmann@tls.net]
> Sent: Monday, December 03, 2012 10:31 AM
> To: cloudstack-users@incubator.apache.org
> Cc: 'Cloudstack Developers'
> Subject: RE: XenServer & VM Snapshots
> 
> Anthony:
> 
> Thank you for the prompt and informative reply.
> 
> > I'm pretty sure mount and copy are using the same XenServe host.
> 
> The behavior I have witnessed with CS 3.0.2 is that it doesn't always
> do the
> mount & copy on the same host. Out of the 12 tests I've performed, only
> once
> was the mount & copy performed on the same host that the VM was running
> on.
> 
> > I think the issue is the backup takes a long time because the data
> volume
> is big and network rate is low.
> > You can increase "BackupSnapshotWait" in global configuration table
> to let
> the backup operation finish.
> 
> I increased this in global settings from the default of 9 hours to 16
> hours.
> The snapshot still doesn't complete on time; it on average copies about
> ~460G before it times out. I'm pretty confident the network rate isn't
> the
> bottle neck as ISOs and imported VHDs install quickly. We have the
> Secondary
> Storage server set as the only internal site allowed to host files. I
> upload
> my ISO or VHD to Secondary Storage server and install using SSVM which
> completes in a very timely manner. With a 1Gb network link, 1TB should
> copy
> in roughly 2 hours (if the link is saturated by the copy process); I've
> only
> found snapshotting (template creation appears to work flawlessly) to
> take an
> insanely long time to complete.
> 
> Is there anything else I can do to increase performance or logs I
> should
> check?
> 
> Cheers,
> 
> Matthew
> 
> 
> Matthew Hartmann
> Systems Administrator | V: 812.378.4100 x 850 | E: mhartmann@tls.net
> 
> TLS.NET, Inc.
> http://www.tls.net
> 
> 
> -----Original Message-----
> From: Anthony Xu [mailto:Xuefei.Xu@citrix.com]
> Sent: Monday, December 03, 2012 12:31 PM
> To: Cloudstack Users
> Cc: Cloudstack Developers
> Subject: RE: XenServer & VM Snapshots
> 
> Hi Matthew,
> 
> You analysis is correct except following,
> 
> >I must mention that the same Compute Node that ran sparse_dd or
> mounted
> Secondary Storage is not always the same. It appears the Management
> Server
> is simply round-robining through the list of >Compute Nodes and using
> the
> first one that is available.
> 
> I'm pretty sure mount and copy are using the same XenServe host.
> 
> I think the issue is the backup takes a long time because the data
> volume is
> big and network rate is low.
> You can increase "BackupSnapshotWait" in global configuration table to
> let
> the backup operation finish.
> 
> 
> Since CS takes the advantage of XenServer image format VHD, it uses VHD
> to
> do snapshot and clone, it requires snapshot to be backed up through
> XenServer host.
> The ideal solution for this issue might be leverage storage snapshot
> and
> clone functionality, Then snapshot back up is executed by storage host,
> relieve some of the limitation.
> Currently CS doesn't support this,  it is not hard to support this
> after
> Edison finishes storage frame change, it should be just another storage
> plug-in.
> When CS uses storage server snapshot and clone function, CS needs to
> consider number of snapshot , number of volume limitation of storage
> server.
> 
> 
> Anthony
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> From: Matthew Hartmann [mailto:mhartmann@tls.net]
> Sent: Monday, December 03, 2012 9:08 AM
> To: Cloudstack Users
> Cc: Cloudstack Developers
> Subject: XenServer & VM Snapshots
> 
> Hello! I'm hoping someone can help me troubleshoot the following issue:
> 
> I have a client who has a 960G data volume which contains their VM's
> Exchange Data Store. When starting a snapshot, I found that a process
> is
> started on one of my Compute Nodes titled "sparse_dd". I found that
> this
> process is then sending the output of "sparse_dd" through another
> Compute
> Node's xapi before placing it into the "snapshot store" on Secondary
> Storage. It appears that this is part of the bottle neck as all of our
> systems are connected via gigabit link and should not take 15+ hours to
> create a snapshot. The following is the behavior that I have analyzed
> from
> within my environment:
> 
> 
> 1)     Snapshot is started (either via Manual or Scheduled).
> 
> 2)     Compute Node 1 "processes the snapshot" by exposing the VDI
> which
> "sparse_dd" then creates a "thin provisioned" snapshot.
> 
> 3)     The output of sparse_dd is delivered over HTTP to xapi on
> Compute
> Node 2 where the Management Server mounted Secondary Storage.
> 
> 4)     Compute Node 2 (receiving the snapshot via xapi) stores the
> snapshot
> in the Secondary Storage mount point.
> 
> Based on the behavior, I have devise the following logic that I believe
> CloudStack is utilizing:
> 
> 
> 1)     CloudStack creates a "snapshot VDI" via XenServer Pool Master's
> API.
> 
> 2)     CloudStack finds a Compute Node that can mount Secondary Storage.
> 
> 3)     CloudStack finds a Compute Node that can run "sparse_dd".
> 
> 4)     CloudStack uses available Compute node to output the VDI to xapi
> on
> the Compute Node that mounted Secondary Storage.
> 
> I must mention that the same Compute Node that ran sparse_dd or mounted
> Secondary Storage is not always the same. It appears the Management
> Server
> is simply round-robining through the list of Compute Nodes and using
> the
> first one that is available.
> 
> Does anyone have any input on the issue I'm having or analysis of how
> CloudStack/XenServer snapshots operate?
> 
> Thanks!
> 
> Cheers,
> 
> Matthew
> 
> 
> 
> Matthew Hartmann
> Systems Administrator | V: 812.378.4100 x 850 | E: mhartmann@tls.net
> 
> [cid:image017.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/?utm_campaign=s
> ignat
> ure&utm_source=home&utm_medium=email>
> 
> [cid:image018.jpg@01CDD14E.DBAA2E70]
> 
> 
> [cid:image019.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/enterprise_clou
> d/clo
> ud.php?utm_campaign=signature&utm_source=enterprise_cloud&utm_medium=em
> ail>
> 
> [cid:image020.jpg@01CDD14E.DBAA2E70]
> 
> [cid:image021.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/solutions/voip_
> servi
> ces/hosted_pbx.php?utm_campaign=signature&utm_source=voip_services&utm_
> mediu
> m=email>
> 
> [cid:image020.jpg@01CDD14E.DBAA2E70]
> 
> [cid:image022.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/solutions/netwo
> rk_en
> gineering.php?utm_campaign=signature&utm_source=network_engineering&utm
> _medi
> um=email>
> 
> [cid:image020.jpg@01CDD14E.DBAA2E70]
> 
> [cid:image023.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/data_centers/da
> ta_ce
> nters.php?utm_campaign=signature&utm_source=data_centers&utm_medium=ema
> il>
> 
> 
> 
> 
> 
> 



RE: XenServer & VM Snapshots

Posted by Anthony Xu <Xu...@citrix.com>.
CS 3.0.2 is too old version. 

I'm pretty sure mount & copy on the same host in 3.0.4 and 3.0.5.
If mount & copy might be on different hosts, the issue is very likely to happen.
I didn't hear this issue from QA and users.

I just checked vmopsSnapshot plug-in for XenServer, at /etc/xapi.d/plugins, 
Which mounts secondary storage just before sparse-dd.

I recommend you to upgrade new version.

If you still see the issue,

Please post related management server log and /var/log/SMlog in XenServer.


Anthony











> -----Original Message-----
> From: Matthew Hartmann [mailto:mhartmann@tls.net]
> Sent: Monday, December 03, 2012 10:31 AM
> To: cloudstack-users@incubator.apache.org
> Cc: 'Cloudstack Developers'
> Subject: RE: XenServer & VM Snapshots
> 
> Anthony:
> 
> Thank you for the prompt and informative reply.
> 
> > I'm pretty sure mount and copy are using the same XenServe host.
> 
> The behavior I have witnessed with CS 3.0.2 is that it doesn't always
> do the
> mount & copy on the same host. Out of the 12 tests I've performed, only
> once
> was the mount & copy performed on the same host that the VM was running
> on.
> 
> > I think the issue is the backup takes a long time because the data
> volume
> is big and network rate is low.
> > You can increase "BackupSnapshotWait" in global configuration table
> to let
> the backup operation finish.
> 
> I increased this in global settings from the default of 9 hours to 16
> hours.
> The snapshot still doesn't complete on time; it on average copies about
> ~460G before it times out. I'm pretty confident the network rate isn't
> the
> bottle neck as ISOs and imported VHDs install quickly. We have the
> Secondary
> Storage server set as the only internal site allowed to host files. I
> upload
> my ISO or VHD to Secondary Storage server and install using SSVM which
> completes in a very timely manner. With a 1Gb network link, 1TB should
> copy
> in roughly 2 hours (if the link is saturated by the copy process); I've
> only
> found snapshotting (template creation appears to work flawlessly) to
> take an
> insanely long time to complete.
> 
> Is there anything else I can do to increase performance or logs I
> should
> check?
> 
> Cheers,
> 
> Matthew
> 
> 
> Matthew Hartmann
> Systems Administrator | V: 812.378.4100 x 850 | E: mhartmann@tls.net
> 
> TLS.NET, Inc.
> http://www.tls.net
> 
> 
> -----Original Message-----
> From: Anthony Xu [mailto:Xuefei.Xu@citrix.com]
> Sent: Monday, December 03, 2012 12:31 PM
> To: Cloudstack Users
> Cc: Cloudstack Developers
> Subject: RE: XenServer & VM Snapshots
> 
> Hi Matthew,
> 
> You analysis is correct except following,
> 
> >I must mention that the same Compute Node that ran sparse_dd or
> mounted
> Secondary Storage is not always the same. It appears the Management
> Server
> is simply round-robining through the list of >Compute Nodes and using
> the
> first one that is available.
> 
> I'm pretty sure mount and copy are using the same XenServe host.
> 
> I think the issue is the backup takes a long time because the data
> volume is
> big and network rate is low.
> You can increase "BackupSnapshotWait" in global configuration table to
> let
> the backup operation finish.
> 
> 
> Since CS takes the advantage of XenServer image format VHD, it uses VHD
> to
> do snapshot and clone, it requires snapshot to be backed up through
> XenServer host.
> The ideal solution for this issue might be leverage storage snapshot
> and
> clone functionality, Then snapshot back up is executed by storage host,
> relieve some of the limitation.
> Currently CS doesn't support this,  it is not hard to support this
> after
> Edison finishes storage frame change, it should be just another storage
> plug-in.
> When CS uses storage server snapshot and clone function, CS needs to
> consider number of snapshot , number of volume limitation of storage
> server.
> 
> 
> Anthony
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> From: Matthew Hartmann [mailto:mhartmann@tls.net]
> Sent: Monday, December 03, 2012 9:08 AM
> To: Cloudstack Users
> Cc: Cloudstack Developers
> Subject: XenServer & VM Snapshots
> 
> Hello! I'm hoping someone can help me troubleshoot the following issue:
> 
> I have a client who has a 960G data volume which contains their VM's
> Exchange Data Store. When starting a snapshot, I found that a process
> is
> started on one of my Compute Nodes titled "sparse_dd". I found that
> this
> process is then sending the output of "sparse_dd" through another
> Compute
> Node's xapi before placing it into the "snapshot store" on Secondary
> Storage. It appears that this is part of the bottle neck as all of our
> systems are connected via gigabit link and should not take 15+ hours to
> create a snapshot. The following is the behavior that I have analyzed
> from
> within my environment:
> 
> 
> 1)     Snapshot is started (either via Manual or Scheduled).
> 
> 2)     Compute Node 1 "processes the snapshot" by exposing the VDI
> which
> "sparse_dd" then creates a "thin provisioned" snapshot.
> 
> 3)     The output of sparse_dd is delivered over HTTP to xapi on
> Compute
> Node 2 where the Management Server mounted Secondary Storage.
> 
> 4)     Compute Node 2 (receiving the snapshot via xapi) stores the
> snapshot
> in the Secondary Storage mount point.
> 
> Based on the behavior, I have devise the following logic that I believe
> CloudStack is utilizing:
> 
> 
> 1)     CloudStack creates a "snapshot VDI" via XenServer Pool Master's
> API.
> 
> 2)     CloudStack finds a Compute Node that can mount Secondary Storage.
> 
> 3)     CloudStack finds a Compute Node that can run "sparse_dd".
> 
> 4)     CloudStack uses available Compute node to output the VDI to xapi
> on
> the Compute Node that mounted Secondary Storage.
> 
> I must mention that the same Compute Node that ran sparse_dd or mounted
> Secondary Storage is not always the same. It appears the Management
> Server
> is simply round-robining through the list of Compute Nodes and using
> the
> first one that is available.
> 
> Does anyone have any input on the issue I'm having or analysis of how
> CloudStack/XenServer snapshots operate?
> 
> Thanks!
> 
> Cheers,
> 
> Matthew
> 
> 
> 
> Matthew Hartmann
> Systems Administrator | V: 812.378.4100 x 850 | E: mhartmann@tls.net
> 
> [cid:image017.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/?utm_campaign=s
> ignat
> ure&utm_source=home&utm_medium=email>
> 
> [cid:image018.jpg@01CDD14E.DBAA2E70]
> 
> 
> [cid:image019.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/enterprise_clou
> d/clo
> ud.php?utm_campaign=signature&utm_source=enterprise_cloud&utm_medium=em
> ail>
> 
> [cid:image020.jpg@01CDD14E.DBAA2E70]
> 
> [cid:image021.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/solutions/voip_
> servi
> ces/hosted_pbx.php?utm_campaign=signature&utm_source=voip_services&utm_
> mediu
> m=email>
> 
> [cid:image020.jpg@01CDD14E.DBAA2E70]
> 
> [cid:image022.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/solutions/netwo
> rk_en
> gineering.php?utm_campaign=signature&utm_source=network_engineering&utm
> _medi
> um=email>
> 
> [cid:image020.jpg@01CDD14E.DBAA2E70]
> 
> [cid:image023.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/data_centers/da
> ta_ce
> nters.php?utm_campaign=signature&utm_source=data_centers&utm_medium=ema
> il>
> 
> 
> 
> 
> 
> 


RE: XenServer & VM Snapshots

Posted by Anthony Xu <Xu...@citrix.com>.
CS 3.0.2 is too old version. 

I'm pretty sure mount & copy on the same host in 3.0.4 and 3.0.5.
If mount & copy might be on different hosts, the issue is very likely to happen.
I didn't hear this issue from QA and users.

I just checked vmopsSnapshot plug-in for XenServer, at /etc/xapi.d/plugins, 
Which mounts secondary storage just before sparse-dd.

I recommend you to upgrade new version.

If you still see the issue,

Please post related management server log and /var/log/SMlog in XenServer.


Anthony











> -----Original Message-----
> From: Matthew Hartmann [mailto:mhartmann@tls.net]
> Sent: Monday, December 03, 2012 10:31 AM
> To: cloudstack-users@incubator.apache.org
> Cc: 'Cloudstack Developers'
> Subject: RE: XenServer & VM Snapshots
> 
> Anthony:
> 
> Thank you for the prompt and informative reply.
> 
> > I'm pretty sure mount and copy are using the same XenServe host.
> 
> The behavior I have witnessed with CS 3.0.2 is that it doesn't always
> do the
> mount & copy on the same host. Out of the 12 tests I've performed, only
> once
> was the mount & copy performed on the same host that the VM was running
> on.
> 
> > I think the issue is the backup takes a long time because the data
> volume
> is big and network rate is low.
> > You can increase "BackupSnapshotWait" in global configuration table
> to let
> the backup operation finish.
> 
> I increased this in global settings from the default of 9 hours to 16
> hours.
> The snapshot still doesn't complete on time; it on average copies about
> ~460G before it times out. I'm pretty confident the network rate isn't
> the
> bottle neck as ISOs and imported VHDs install quickly. We have the
> Secondary
> Storage server set as the only internal site allowed to host files. I
> upload
> my ISO or VHD to Secondary Storage server and install using SSVM which
> completes in a very timely manner. With a 1Gb network link, 1TB should
> copy
> in roughly 2 hours (if the link is saturated by the copy process); I've
> only
> found snapshotting (template creation appears to work flawlessly) to
> take an
> insanely long time to complete.
> 
> Is there anything else I can do to increase performance or logs I
> should
> check?
> 
> Cheers,
> 
> Matthew
> 
> 
> Matthew Hartmann
> Systems Administrator | V: 812.378.4100 x 850 | E: mhartmann@tls.net
> 
> TLS.NET, Inc.
> http://www.tls.net
> 
> 
> -----Original Message-----
> From: Anthony Xu [mailto:Xuefei.Xu@citrix.com]
> Sent: Monday, December 03, 2012 12:31 PM
> To: Cloudstack Users
> Cc: Cloudstack Developers
> Subject: RE: XenServer & VM Snapshots
> 
> Hi Matthew,
> 
> You analysis is correct except following,
> 
> >I must mention that the same Compute Node that ran sparse_dd or
> mounted
> Secondary Storage is not always the same. It appears the Management
> Server
> is simply round-robining through the list of >Compute Nodes and using
> the
> first one that is available.
> 
> I'm pretty sure mount and copy are using the same XenServe host.
> 
> I think the issue is the backup takes a long time because the data
> volume is
> big and network rate is low.
> You can increase "BackupSnapshotWait" in global configuration table to
> let
> the backup operation finish.
> 
> 
> Since CS takes the advantage of XenServer image format VHD, it uses VHD
> to
> do snapshot and clone, it requires snapshot to be backed up through
> XenServer host.
> The ideal solution for this issue might be leverage storage snapshot
> and
> clone functionality, Then snapshot back up is executed by storage host,
> relieve some of the limitation.
> Currently CS doesn't support this,  it is not hard to support this
> after
> Edison finishes storage frame change, it should be just another storage
> plug-in.
> When CS uses storage server snapshot and clone function, CS needs to
> consider number of snapshot , number of volume limitation of storage
> server.
> 
> 
> Anthony
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> From: Matthew Hartmann [mailto:mhartmann@tls.net]
> Sent: Monday, December 03, 2012 9:08 AM
> To: Cloudstack Users
> Cc: Cloudstack Developers
> Subject: XenServer & VM Snapshots
> 
> Hello! I'm hoping someone can help me troubleshoot the following issue:
> 
> I have a client who has a 960G data volume which contains their VM's
> Exchange Data Store. When starting a snapshot, I found that a process
> is
> started on one of my Compute Nodes titled "sparse_dd". I found that
> this
> process is then sending the output of "sparse_dd" through another
> Compute
> Node's xapi before placing it into the "snapshot store" on Secondary
> Storage. It appears that this is part of the bottle neck as all of our
> systems are connected via gigabit link and should not take 15+ hours to
> create a snapshot. The following is the behavior that I have analyzed
> from
> within my environment:
> 
> 
> 1)     Snapshot is started (either via Manual or Scheduled).
> 
> 2)     Compute Node 1 "processes the snapshot" by exposing the VDI
> which
> "sparse_dd" then creates a "thin provisioned" snapshot.
> 
> 3)     The output of sparse_dd is delivered over HTTP to xapi on
> Compute
> Node 2 where the Management Server mounted Secondary Storage.
> 
> 4)     Compute Node 2 (receiving the snapshot via xapi) stores the
> snapshot
> in the Secondary Storage mount point.
> 
> Based on the behavior, I have devise the following logic that I believe
> CloudStack is utilizing:
> 
> 
> 1)     CloudStack creates a "snapshot VDI" via XenServer Pool Master's
> API.
> 
> 2)     CloudStack finds a Compute Node that can mount Secondary Storage.
> 
> 3)     CloudStack finds a Compute Node that can run "sparse_dd".
> 
> 4)     CloudStack uses available Compute node to output the VDI to xapi
> on
> the Compute Node that mounted Secondary Storage.
> 
> I must mention that the same Compute Node that ran sparse_dd or mounted
> Secondary Storage is not always the same. It appears the Management
> Server
> is simply round-robining through the list of Compute Nodes and using
> the
> first one that is available.
> 
> Does anyone have any input on the issue I'm having or analysis of how
> CloudStack/XenServer snapshots operate?
> 
> Thanks!
> 
> Cheers,
> 
> Matthew
> 
> 
> 
> Matthew Hartmann
> Systems Administrator | V: 812.378.4100 x 850 | E: mhartmann@tls.net
> 
> [cid:image017.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/?utm_campaign=s
> ignat
> ure&utm_source=home&utm_medium=email>
> 
> [cid:image018.jpg@01CDD14E.DBAA2E70]
> 
> 
> [cid:image019.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/enterprise_clou
> d/clo
> ud.php?utm_campaign=signature&utm_source=enterprise_cloud&utm_medium=em
> ail>
> 
> [cid:image020.jpg@01CDD14E.DBAA2E70]
> 
> [cid:image021.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/solutions/voip_
> servi
> ces/hosted_pbx.php?utm_campaign=signature&utm_source=voip_services&utm_
> mediu
> m=email>
> 
> [cid:image020.jpg@01CDD14E.DBAA2E70]
> 
> [cid:image022.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/solutions/netwo
> rk_en
> gineering.php?utm_campaign=signature&utm_source=network_engineering&utm
> _medi
> um=email>
> 
> [cid:image020.jpg@01CDD14E.DBAA2E70]
> 
> [cid:image023.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/data_centers/da
> ta_ce
> nters.php?utm_campaign=signature&utm_source=data_centers&utm_medium=ema
> il>
> 
> 
> 
> 
> 
> 


RE: XenServer & VM Snapshots

Posted by Anthony Xu <Xu...@citrix.com>.
VM snapshot is a nice feature to have, but I think it has the same issue as volume snapshot when VM snapshot is backed up to secondary storage.

It has a great room to improve VDI-copy, right now the slowness is not caused by coalesce. Right now vdi-copy goes through all layers of blk-tap2,
The context switch consume most of the CPU cycle, if vdi-copy can work on VHD chain directly, it can improve a lot.

If you compare performance of vdi-copy and "vhd-util coalesce" on the same VHD chain, you'll see how many it can improve, that's a lot.


Anthony

> -----Original Message-----
> From: Mice Xia [mailto:mice_xia@tcloudcomputing.com]
> Sent: Monday, December 03, 2012 11:18 AM
> To: cloudstack-dev@incubator.apache.org
> Subject: 答复: XenServer & VM Snapshots
> 
> 
> Anthony,
> 
> This is one of the reasons that Im working on VM snapshot on PS,
> (instead of volume snapshot)
> 
> I don't think it's easy to improve vdi-copy, considering it needs to
> coalesce incremental snapshots and verify the result.
> 
> mice
> 
> -----Original Message-----
> From: Anthony Xu [mailto:Xuefei.Xu@citrix.com]
> Sent: 2012-12-4 (星期二) 3:08
> To: cloudstack-dev@incubator.apache.org
> Subject: RE: XenServer & VM Snapshots
> 
> You are right, Vdi-copy is slow. we have reported this to XenServer
> team, they are working on this, but no time/road map is provided on
> this so far.
> 
> 
> Anthony
> 
> > -----Original Message-----
> > From: Mice Xia [mailto:mice_xia@tcloudcomputing.com]
> > Sent: Monday, December 03, 2012 11:05 AM
> > To: cloudstack-dev@incubator.apache.org
> > Subject: 答复: XenServer & VM Snapshots
> >
> > It is slow to take volume snapshot if your volume is huge, the reason
> > is vdi-copy, which is used to backup snapshot to SS, has performance
> > problem.
> >
> > You can't speed it up much for a full snapshot, perhaps you can try
> > increasing dom0 memory, or, adjust the ratio between full snapshot
> and
> > incremental snapshot to reduce the times of full snapshot.
> >
> > Mice
> >
> >
> > -----Original Message-----
> > From: Matthew Hartmann [mailto:mhartmann@tls.net]
> > Sent: 2012-12-4 (星期二) 2:31
> > To: cloudstack-users@incubator.apache.org
> > Cc: 'Cloudstack Developers'
> > Subject: RE: XenServer & VM Snapshots
> >
> > Anthony:
> >
> > Thank you for the prompt and informative reply.
> >
> > > I'm pretty sure mount and copy are using the same XenServe host.
> >
> > The behavior I have witnessed with CS 3.0.2 is that it doesn't always
> > do the
> > mount & copy on the same host. Out of the 12 tests I've performed,
> only
> > once
> > was the mount & copy performed on the same host that the VM was
> running
> > on.
> >
> > > I think the issue is the backup takes a long time because the data
> > volume
> > is big and network rate is low.
> > > You can increase "BackupSnapshotWait" in global configuration table
> > to let
> > the backup operation finish.
> >
> > I increased this in global settings from the default of 9 hours to 16
> > hours.
> > The snapshot still doesn't complete on time; it on average copies
> about
> > ~460G before it times out. I'm pretty confident the network rate
> isn't
> > the
> > bottle neck as ISOs and imported VHDs install quickly. We have the
> > SecondaryP
> > Storage server set as the only internal site allowed to host files. I
> > upload
> > my ISO or VHD to Secondary Storage server and install using SSVM
> which
> > completes in a very timely manner. With a 1Gb network link, 1TB
> should
> > copy
> > in roughly 2 hours (if the link is saturated by the copy process);
> I've
> > only
> > found snapshotting (template creation appears to work flawlessly) to
> > take an
> > insanely long time to complete.
> >
> > Is there anything else I can do to increase performance or logs I
> > should
> > check?
> >
> > Cheers,
> >
> > Matthew
> >
> >
> > Matthew Hartmann
> > Systems Administrator | V: 812.378.4100 x 850 | E: mhartmann@tls.net
> >
> > TLS.NET, Inc.
> > http://www.tls.net
> >
> >
> > -----Original Message-----
> > From: Anthony Xu [mailto:Xuefei.Xu@citrix.com]
> > Sent: Monday, December 03, 2012 12:31 PM
> > To: Cloudstack Users
> > Cc: Cloudstack Developers
> > Subject: RE: XenServer & VM Snapshots
> >
> > Hi Matthew,
> >
> > You analysis is correct except following,
> >
> > >I must mention that the same Compute Node that ran sparse_dd or
> > mounted
> > Secondary Storage is not always the same. It appears the Management
> > Server
> > is simply round-robining through the list of >Compute Nodes and using
> > the
> > first one that is available.
> >
> > I'm pretty sure mount and copy are using the same XenServe host.
> >
> > I think the issue is the backup takes a long time because the data
> > volume is
> > big and network rate is low.
> > You can increase "BackupSnapshotWait" in global configuration table
> to
> > let
> > the backup operation finish.
> >
> >
> > Since CS takes the advantage of XenServer image format VHD, it uses
> VHD
> > to
> > do snapshot and clone, it requires snapshot to be backed up through
> > XenServer host.
> > The ideal solution for this issue might be leverage storage snapshot
> > and
> > clone functionality, Then snapshot back up is executed by storage
> host,
> > relieve some of the limitation.
> > Currently CS doesn't support this,  it is not hard to support this
> > after
> > Edison finishes storage frame change, it should be just another
> storage
> > plug-in.
> > When CS uses storage server snapshot and clone function, CS needs to
> > consider number of snapshot , number of volume limitation of storage
> > server.
> >
> >
> > Anthony
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > From: Matthew Hartmann [mailto:mhartmann@tls.net]
> > Sent: Monday, December 03, 2012 9:08 AM
> > To: Cloudstack Users
> > Cc: Cloudstack Developers
> > Subject: XenServer & VM Snapshots
> >
> > Hello! I'm hoping someone can help me troubleshoot the following
> issue:
> >
> > I have a client who has a 960G data volume which contains their VM's
> > Exchange Data Store. When starting a snapshot, I found that a process
> > is
> > started on one of my Compute Nodes titled "sparse_dd". I found that
> > this
> > process is then sending the output of "sparse_dd" through another
> > Compute
> > Node's xapi before placing it into the "snapshot store" on Secondary
> > Storage. It appears that this is part of the bottle neck as all of
> our
> > systems are connected via gigabit link and should not take 15+ hours
> to
> > create a snapshot. The following is the behavior that I have analyzed
> > from
> > within my environment:
> >
> >
> > 1)     Snapshot is started (either via Manual or Scheduled).
> >
> > 2)     Compute Node 1 "processes the snapshot" by exposing the VDI
> > which
> > "sparse_dd" then creates a "thin provisioned" snapshot.
> >
> > 3)     The output of sparse_dd is delivered over HTTP to xapi on
> > Compute
> > Node 2 where the Management Server mounted Secondary Storage.
> >
> > 4)     Compute Node 2 (receiving the snapshot via xapi) stores the
> > snapshot
> > in the Secondary Storage mount point.
> >
> > Based on the behavior, I have devise the following logic that I
> believe
> > CloudStack is utilizing:
> >
> >
> > 1)     CloudStack creates a "snapshot VDI" via XenServer Pool
> Master's
> > API.
> >
> > 2)     CloudStack finds a Compute Node that can mount Secondary
> Storage.
> >
> > 3)     CloudStack finds a Compute Node that can run "sparse_dd".
> >
> > 4)     CloudStack uses available Compute node to output the VDI to
> xapi
> > on
> > the Compute Node that mounted Secondary Storage.
> >
> > I must mention that the same Compute Node that ran sparse_dd or
> mounted
> > Secondary Storage is not always the same. It appears the Management
> > Server
> > is simply round-robining through the list of Compute Nodes and using
> > the
> > first one that is available.
> >
> > Does anyone have any input on the issue I'm having or analysis of how
> > CloudStack/XenServer snapshots operate?
> >
> > Thanks!
> >
> > Cheers,
> >
> > Matthew
> >
> >
> >
> > Matthew Hartmann
> > Systems Administrator | V: 812.378.4100 x 850 | E: mhartmann@tls.net
> >
> >
> [cid:image017.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/?utm_campaign=s
> > ignat
> > ure&utm_source=home&utm_medium=email>
> >
> > [cid:image018.jpg@01CDD14E.DBAA2E70]
> >
> >
> >
> [cid:image019.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/enterprise_clou
> > d/clo
> >
> ud.php?utm_campaign=signature&utm_source=enterprise_cloud&utm_medium=em
> > ail>
> >
> > [cid:image020.jpg@01CDD14E.DBAA2E70]
> >
> >
> [cid:image021.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/solutions/voip_
> > servi
> >
> ces/hosted_pbx.php?utm_campaign=signature&utm_source=voip_services&utm_
> > mediu
> > m=email>
> >
> > [cid:image020.jpg@01CDD14E.DBAA2E70]
> >
> >
> [cid:image022.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/solutions/netwo
> > rk_en
> >
> gineering.php?utm_campaign=signature&utm_source=network_engineering&utm
> > _medi
> > um=email>
> >
> > [cid:image020.jpg@01CDD14E.DBAA2E70]
> >
> >
> [cid:image023.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/data_centers/da
> > ta_ce
> >
> nters.php?utm_campaign=signature&utm_source=data_centers&utm_medium=ema
> > il>
> >
> >
> >
> >
> >
> >
> >
> 


答复: XenServer & VM Snapshots

Posted by Mice Xia <mi...@tcloudcomputing.com>.
Anthony,

This is one of the reasons that Im working on VM snapshot on PS, (instead of volume snapshot)

I don't think it's easy to improve vdi-copy, considering it needs to coalesce incremental snapshots and verify the result.

mice

-----Original Message-----
From: Anthony Xu [mailto:Xuefei.Xu@citrix.com]
Sent: 2012-12-4 (星期二) 3:08
To: cloudstack-dev@incubator.apache.org
Subject: RE: XenServer & VM Snapshots
 
You are right, Vdi-copy is slow. we have reported this to XenServer team, they are working on this, but no time/road map is provided on this so far.


Anthony

> -----Original Message-----
> From: Mice Xia [mailto:mice_xia@tcloudcomputing.com]
> Sent: Monday, December 03, 2012 11:05 AM
> To: cloudstack-dev@incubator.apache.org
> Subject: 答复: XenServer & VM Snapshots
> 
> It is slow to take volume snapshot if your volume is huge, the reason
> is vdi-copy, which is used to backup snapshot to SS, has performance
> problem.
> 
> You can't speed it up much for a full snapshot, perhaps you can try
> increasing dom0 memory, or, adjust the ratio between full snapshot and
> incremental snapshot to reduce the times of full snapshot.
> 
> Mice
> 
> 
> -----Original Message-----
> From: Matthew Hartmann [mailto:mhartmann@tls.net]
> Sent: 2012-12-4 (星期二) 2:31
> To: cloudstack-users@incubator.apache.org
> Cc: 'Cloudstack Developers'
> Subject: RE: XenServer & VM Snapshots
> 
> Anthony:
> 
> Thank you for the prompt and informative reply.
> 
> > I'm pretty sure mount and copy are using the same XenServe host.
> 
> The behavior I have witnessed with CS 3.0.2 is that it doesn't always
> do the
> mount & copy on the same host. Out of the 12 tests I've performed, only
> once
> was the mount & copy performed on the same host that the VM was running
> on.
> 
> > I think the issue is the backup takes a long time because the data
> volume
> is big and network rate is low.
> > You can increase "BackupSnapshotWait" in global configuration table
> to let
> the backup operation finish.
> 
> I increased this in global settings from the default of 9 hours to 16
> hours.
> The snapshot still doesn't complete on time; it on average copies about
> ~460G before it times out. I'm pretty confident the network rate isn't
> the
> bottle neck as ISOs and imported VHDs install quickly. We have the
> SecondaryP
> Storage server set as the only internal site allowed to host files. I
> upload
> my ISO or VHD to Secondary Storage server and install using SSVM which
> completes in a very timely manner. With a 1Gb network link, 1TB should
> copy
> in roughly 2 hours (if the link is saturated by the copy process); I've
> only
> found snapshotting (template creation appears to work flawlessly) to
> take an
> insanely long time to complete.
> 
> Is there anything else I can do to increase performance or logs I
> should
> check?
> 
> Cheers,
> 
> Matthew
> 
> 
> Matthew Hartmann
> Systems Administrator | V: 812.378.4100 x 850 | E: mhartmann@tls.net
> 
> TLS.NET, Inc.
> http://www.tls.net
> 
> 
> -----Original Message-----
> From: Anthony Xu [mailto:Xuefei.Xu@citrix.com]
> Sent: Monday, December 03, 2012 12:31 PM
> To: Cloudstack Users
> Cc: Cloudstack Developers
> Subject: RE: XenServer & VM Snapshots
> 
> Hi Matthew,
> 
> You analysis is correct except following,
> 
> >I must mention that the same Compute Node that ran sparse_dd or
> mounted
> Secondary Storage is not always the same. It appears the Management
> Server
> is simply round-robining through the list of >Compute Nodes and using
> the
> first one that is available.
> 
> I'm pretty sure mount and copy are using the same XenServe host.
> 
> I think the issue is the backup takes a long time because the data
> volume is
> big and network rate is low.
> You can increase "BackupSnapshotWait" in global configuration table to
> let
> the backup operation finish.
> 
> 
> Since CS takes the advantage of XenServer image format VHD, it uses VHD
> to
> do snapshot and clone, it requires snapshot to be backed up through
> XenServer host.
> The ideal solution for this issue might be leverage storage snapshot
> and
> clone functionality, Then snapshot back up is executed by storage host,
> relieve some of the limitation.
> Currently CS doesn't support this,  it is not hard to support this
> after
> Edison finishes storage frame change, it should be just another storage
> plug-in.
> When CS uses storage server snapshot and clone function, CS needs to
> consider number of snapshot , number of volume limitation of storage
> server.
> 
> 
> Anthony
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> From: Matthew Hartmann [mailto:mhartmann@tls.net]
> Sent: Monday, December 03, 2012 9:08 AM
> To: Cloudstack Users
> Cc: Cloudstack Developers
> Subject: XenServer & VM Snapshots
> 
> Hello! I'm hoping someone can help me troubleshoot the following issue:
> 
> I have a client who has a 960G data volume which contains their VM's
> Exchange Data Store. When starting a snapshot, I found that a process
> is
> started on one of my Compute Nodes titled "sparse_dd". I found that
> this
> process is then sending the output of "sparse_dd" through another
> Compute
> Node's xapi before placing it into the "snapshot store" on Secondary
> Storage. It appears that this is part of the bottle neck as all of our
> systems are connected via gigabit link and should not take 15+ hours to
> create a snapshot. The following is the behavior that I have analyzed
> from
> within my environment:
> 
> 
> 1)     Snapshot is started (either via Manual or Scheduled).
> 
> 2)     Compute Node 1 "processes the snapshot" by exposing the VDI
> which
> "sparse_dd" then creates a "thin provisioned" snapshot.
> 
> 3)     The output of sparse_dd is delivered over HTTP to xapi on
> Compute
> Node 2 where the Management Server mounted Secondary Storage.
> 
> 4)     Compute Node 2 (receiving the snapshot via xapi) stores the
> snapshot
> in the Secondary Storage mount point.
> 
> Based on the behavior, I have devise the following logic that I believe
> CloudStack is utilizing:
> 
> 
> 1)     CloudStack creates a "snapshot VDI" via XenServer Pool Master's
> API.
> 
> 2)     CloudStack finds a Compute Node that can mount Secondary Storage.
> 
> 3)     CloudStack finds a Compute Node that can run "sparse_dd".
> 
> 4)     CloudStack uses available Compute node to output the VDI to xapi
> on
> the Compute Node that mounted Secondary Storage.
> 
> I must mention that the same Compute Node that ran sparse_dd or mounted
> Secondary Storage is not always the same. It appears the Management
> Server
> is simply round-robining through the list of Compute Nodes and using
> the
> first one that is available.
> 
> Does anyone have any input on the issue I'm having or analysis of how
> CloudStack/XenServer snapshots operate?
> 
> Thanks!
> 
> Cheers,
> 
> Matthew
> 
> 
> 
> Matthew Hartmann
> Systems Administrator | V: 812.378.4100 x 850 | E: mhartmann@tls.net
> 
> [cid:image017.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/?utm_campaign=s
> ignat
> ure&utm_source=home&utm_medium=email>
> 
> [cid:image018.jpg@01CDD14E.DBAA2E70]
> 
> 
> [cid:image019.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/enterprise_clou
> d/clo
> ud.php?utm_campaign=signature&utm_source=enterprise_cloud&utm_medium=em
> ail>
> 
> [cid:image020.jpg@01CDD14E.DBAA2E70]
> 
> [cid:image021.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/solutions/voip_
> servi
> ces/hosted_pbx.php?utm_campaign=signature&utm_source=voip_services&utm_
> mediu
> m=email>
> 
> [cid:image020.jpg@01CDD14E.DBAA2E70]
> 
> [cid:image022.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/solutions/netwo
> rk_en
> gineering.php?utm_campaign=signature&utm_source=network_engineering&utm
> _medi
> um=email>
> 
> [cid:image020.jpg@01CDD14E.DBAA2E70]
> 
> [cid:image023.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/data_centers/da
> ta_ce
> nters.php?utm_campaign=signature&utm_source=data_centers&utm_medium=ema
> il>
> 
> 
> 
> 
> 
> 
> 



RE: XenServer & VM Snapshots

Posted by Anthony Xu <Xu...@citrix.com>.
You are right, Vdi-copy is slow. we have reported this to XenServer team, they are working on this, but no time/road map is provided on this so far.


Anthony

> -----Original Message-----
> From: Mice Xia [mailto:mice_xia@tcloudcomputing.com]
> Sent: Monday, December 03, 2012 11:05 AM
> To: cloudstack-dev@incubator.apache.org
> Subject: 答复: XenServer & VM Snapshots
> 
> It is slow to take volume snapshot if your volume is huge, the reason
> is vdi-copy, which is used to backup snapshot to SS, has performance
> problem.
> 
> You can't speed it up much for a full snapshot, perhaps you can try
> increasing dom0 memory, or, adjust the ratio between full snapshot and
> incremental snapshot to reduce the times of full snapshot.
> 
> Mice
> 
> 
> -----Original Message-----
> From: Matthew Hartmann [mailto:mhartmann@tls.net]
> Sent: 2012-12-4 (星期二) 2:31
> To: cloudstack-users@incubator.apache.org
> Cc: 'Cloudstack Developers'
> Subject: RE: XenServer & VM Snapshots
> 
> Anthony:
> 
> Thank you for the prompt and informative reply.
> 
> > I'm pretty sure mount and copy are using the same XenServe host.
> 
> The behavior I have witnessed with CS 3.0.2 is that it doesn't always
> do the
> mount & copy on the same host. Out of the 12 tests I've performed, only
> once
> was the mount & copy performed on the same host that the VM was running
> on.
> 
> > I think the issue is the backup takes a long time because the data
> volume
> is big and network rate is low.
> > You can increase "BackupSnapshotWait" in global configuration table
> to let
> the backup operation finish.
> 
> I increased this in global settings from the default of 9 hours to 16
> hours.
> The snapshot still doesn't complete on time; it on average copies about
> ~460G before it times out. I'm pretty confident the network rate isn't
> the
> bottle neck as ISOs and imported VHDs install quickly. We have the
> SecondaryP
> Storage server set as the only internal site allowed to host files. I
> upload
> my ISO or VHD to Secondary Storage server and install using SSVM which
> completes in a very timely manner. With a 1Gb network link, 1TB should
> copy
> in roughly 2 hours (if the link is saturated by the copy process); I've
> only
> found snapshotting (template creation appears to work flawlessly) to
> take an
> insanely long time to complete.
> 
> Is there anything else I can do to increase performance or logs I
> should
> check?
> 
> Cheers,
> 
> Matthew
> 
> 
> Matthew Hartmann
> Systems Administrator | V: 812.378.4100 x 850 | E: mhartmann@tls.net
> 
> TLS.NET, Inc.
> http://www.tls.net
> 
> 
> -----Original Message-----
> From: Anthony Xu [mailto:Xuefei.Xu@citrix.com]
> Sent: Monday, December 03, 2012 12:31 PM
> To: Cloudstack Users
> Cc: Cloudstack Developers
> Subject: RE: XenServer & VM Snapshots
> 
> Hi Matthew,
> 
> You analysis is correct except following,
> 
> >I must mention that the same Compute Node that ran sparse_dd or
> mounted
> Secondary Storage is not always the same. It appears the Management
> Server
> is simply round-robining through the list of >Compute Nodes and using
> the
> first one that is available.
> 
> I'm pretty sure mount and copy are using the same XenServe host.
> 
> I think the issue is the backup takes a long time because the data
> volume is
> big and network rate is low.
> You can increase "BackupSnapshotWait" in global configuration table to
> let
> the backup operation finish.
> 
> 
> Since CS takes the advantage of XenServer image format VHD, it uses VHD
> to
> do snapshot and clone, it requires snapshot to be backed up through
> XenServer host.
> The ideal solution for this issue might be leverage storage snapshot
> and
> clone functionality, Then snapshot back up is executed by storage host,
> relieve some of the limitation.
> Currently CS doesn't support this,  it is not hard to support this
> after
> Edison finishes storage frame change, it should be just another storage
> plug-in.
> When CS uses storage server snapshot and clone function, CS needs to
> consider number of snapshot , number of volume limitation of storage
> server.
> 
> 
> Anthony
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> From: Matthew Hartmann [mailto:mhartmann@tls.net]
> Sent: Monday, December 03, 2012 9:08 AM
> To: Cloudstack Users
> Cc: Cloudstack Developers
> Subject: XenServer & VM Snapshots
> 
> Hello! I'm hoping someone can help me troubleshoot the following issue:
> 
> I have a client who has a 960G data volume which contains their VM's
> Exchange Data Store. When starting a snapshot, I found that a process
> is
> started on one of my Compute Nodes titled "sparse_dd". I found that
> this
> process is then sending the output of "sparse_dd" through another
> Compute
> Node's xapi before placing it into the "snapshot store" on Secondary
> Storage. It appears that this is part of the bottle neck as all of our
> systems are connected via gigabit link and should not take 15+ hours to
> create a snapshot. The following is the behavior that I have analyzed
> from
> within my environment:
> 
> 
> 1)     Snapshot is started (either via Manual or Scheduled).
> 
> 2)     Compute Node 1 "processes the snapshot" by exposing the VDI
> which
> "sparse_dd" then creates a "thin provisioned" snapshot.
> 
> 3)     The output of sparse_dd is delivered over HTTP to xapi on
> Compute
> Node 2 where the Management Server mounted Secondary Storage.
> 
> 4)     Compute Node 2 (receiving the snapshot via xapi) stores the
> snapshot
> in the Secondary Storage mount point.
> 
> Based on the behavior, I have devise the following logic that I believe
> CloudStack is utilizing:
> 
> 
> 1)     CloudStack creates a "snapshot VDI" via XenServer Pool Master's
> API.
> 
> 2)     CloudStack finds a Compute Node that can mount Secondary Storage.
> 
> 3)     CloudStack finds a Compute Node that can run "sparse_dd".
> 
> 4)     CloudStack uses available Compute node to output the VDI to xapi
> on
> the Compute Node that mounted Secondary Storage.
> 
> I must mention that the same Compute Node that ran sparse_dd or mounted
> Secondary Storage is not always the same. It appears the Management
> Server
> is simply round-robining through the list of Compute Nodes and using
> the
> first one that is available.
> 
> Does anyone have any input on the issue I'm having or analysis of how
> CloudStack/XenServer snapshots operate?
> 
> Thanks!
> 
> Cheers,
> 
> Matthew
> 
> 
> 
> Matthew Hartmann
> Systems Administrator | V: 812.378.4100 x 850 | E: mhartmann@tls.net
> 
> [cid:image017.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/?utm_campaign=s
> ignat
> ure&utm_source=home&utm_medium=email>
> 
> [cid:image018.jpg@01CDD14E.DBAA2E70]
> 
> 
> [cid:image019.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/enterprise_clou
> d/clo
> ud.php?utm_campaign=signature&utm_source=enterprise_cloud&utm_medium=em
> ail>
> 
> [cid:image020.jpg@01CDD14E.DBAA2E70]
> 
> [cid:image021.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/solutions/voip_
> servi
> ces/hosted_pbx.php?utm_campaign=signature&utm_source=voip_services&utm_
> mediu
> m=email>
> 
> [cid:image020.jpg@01CDD14E.DBAA2E70]
> 
> [cid:image022.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/solutions/netwo
> rk_en
> gineering.php?utm_campaign=signature&utm_source=network_engineering&utm
> _medi
> um=email>
> 
> [cid:image020.jpg@01CDD14E.DBAA2E70]
> 
> [cid:image023.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/data_centers/da
> ta_ce
> nters.php?utm_campaign=signature&utm_source=data_centers&utm_medium=ema
> il>
> 
> 
> 
> 
> 
> 
> 


答复: XenServer & VM Snapshots

Posted by Mice Xia <mi...@tcloudcomputing.com>.
It is slow to take volume snapshot if your volume is huge, the reason is vdi-copy, which is used to backup snapshot to SS, has performance problem.

You can't speed it up much for a full snapshot, perhaps you can try increasing dom0 memory, or, adjust the ratio between full snapshot and incremental snapshot to reduce the times of full snapshot.

Mice


-----Original Message-----
From: Matthew Hartmann [mailto:mhartmann@tls.net]
Sent: 2012-12-4 (星期二) 2:31
To: cloudstack-users@incubator.apache.org
Cc: 'Cloudstack Developers'
Subject: RE: XenServer & VM Snapshots
 
Anthony:

Thank you for the prompt and informative reply.

> I'm pretty sure mount and copy are using the same XenServe host.

The behavior I have witnessed with CS 3.0.2 is that it doesn't always do the
mount & copy on the same host. Out of the 12 tests I've performed, only once
was the mount & copy performed on the same host that the VM was running on.

> I think the issue is the backup takes a long time because the data volume
is big and network rate is low.
> You can increase "BackupSnapshotWait" in global configuration table to let
the backup operation finish.

I increased this in global settings from the default of 9 hours to 16 hours.
The snapshot still doesn't complete on time; it on average copies about
~460G before it times out. I'm pretty confident the network rate isn't the
bottle neck as ISOs and imported VHDs install quickly. We have the SecondaryP
Storage server set as the only internal site allowed to host files. I upload
my ISO or VHD to Secondary Storage server and install using SSVM which
completes in a very timely manner. With a 1Gb network link, 1TB should copy
in roughly 2 hours (if the link is saturated by the copy process); I've only
found snapshotting (template creation appears to work flawlessly) to take an
insanely long time to complete.

Is there anything else I can do to increase performance or logs I should
check?

Cheers,

Matthew


Matthew Hartmann
Systems Administrator | V: 812.378.4100 x 850 | E: mhartmann@tls.net

TLS.NET, Inc.
http://www.tls.net


-----Original Message-----
From: Anthony Xu [mailto:Xuefei.Xu@citrix.com] 
Sent: Monday, December 03, 2012 12:31 PM
To: Cloudstack Users
Cc: Cloudstack Developers
Subject: RE: XenServer & VM Snapshots

Hi Matthew,

You analysis is correct except following,

>I must mention that the same Compute Node that ran sparse_dd or mounted
Secondary Storage is not always the same. It appears the Management Server
is simply round-robining through the list of >Compute Nodes and using the
first one that is available.

I'm pretty sure mount and copy are using the same XenServe host.

I think the issue is the backup takes a long time because the data volume is
big and network rate is low.
You can increase "BackupSnapshotWait" in global configuration table to let
the backup operation finish.


Since CS takes the advantage of XenServer image format VHD, it uses VHD to
do snapshot and clone, it requires snapshot to be backed up through
XenServer host.
The ideal solution for this issue might be leverage storage snapshot and
clone functionality, Then snapshot back up is executed by storage host,
relieve some of the limitation.
Currently CS doesn't support this,  it is not hard to support this after
Edison finishes storage frame change, it should be just another storage
plug-in.
When CS uses storage server snapshot and clone function, CS needs to
consider number of snapshot , number of volume limitation of storage server.


Anthony














From: Matthew Hartmann [mailto:mhartmann@tls.net]
Sent: Monday, December 03, 2012 9:08 AM
To: Cloudstack Users
Cc: Cloudstack Developers
Subject: XenServer & VM Snapshots

Hello! I'm hoping someone can help me troubleshoot the following issue:

I have a client who has a 960G data volume which contains their VM's
Exchange Data Store. When starting a snapshot, I found that a process is
started on one of my Compute Nodes titled "sparse_dd". I found that this
process is then sending the output of "sparse_dd" through another Compute
Node's xapi before placing it into the "snapshot store" on Secondary
Storage. It appears that this is part of the bottle neck as all of our
systems are connected via gigabit link and should not take 15+ hours to
create a snapshot. The following is the behavior that I have analyzed from
within my environment:


1)     Snapshot is started (either via Manual or Scheduled).

2)     Compute Node 1 "processes the snapshot" by exposing the VDI which
"sparse_dd" then creates a "thin provisioned" snapshot.

3)     The output of sparse_dd is delivered over HTTP to xapi on Compute
Node 2 where the Management Server mounted Secondary Storage.

4)     Compute Node 2 (receiving the snapshot via xapi) stores the snapshot
in the Secondary Storage mount point.

Based on the behavior, I have devise the following logic that I believe
CloudStack is utilizing:


1)     CloudStack creates a "snapshot VDI" via XenServer Pool Master's API.

2)     CloudStack finds a Compute Node that can mount Secondary Storage.

3)     CloudStack finds a Compute Node that can run "sparse_dd".

4)     CloudStack uses available Compute node to output the VDI to xapi on
the Compute Node that mounted Secondary Storage.

I must mention that the same Compute Node that ran sparse_dd or mounted
Secondary Storage is not always the same. It appears the Management Server
is simply round-robining through the list of Compute Nodes and using the
first one that is available.

Does anyone have any input on the issue I'm having or analysis of how
CloudStack/XenServer snapshots operate?

Thanks!

Cheers,

Matthew



Matthew Hartmann
Systems Administrator | V: 812.378.4100 x 850 | E: mhartmann@tls.net

[cid:image017.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/?utm_campaign=signat
ure&utm_source=home&utm_medium=email>

[cid:image018.jpg@01CDD14E.DBAA2E70]


[cid:image019.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/enterprise_cloud/clo
ud.php?utm_campaign=signature&utm_source=enterprise_cloud&utm_medium=email>

[cid:image020.jpg@01CDD14E.DBAA2E70]

[cid:image021.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/solutions/voip_servi
ces/hosted_pbx.php?utm_campaign=signature&utm_source=voip_services&utm_mediu
m=email>

[cid:image020.jpg@01CDD14E.DBAA2E70]

[cid:image022.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/solutions/network_en
gineering.php?utm_campaign=signature&utm_source=network_engineering&utm_medi
um=email>

[cid:image020.jpg@01CDD14E.DBAA2E70]

[cid:image023.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/data_centers/data_ce
nters.php?utm_campaign=signature&utm_source=data_centers&utm_medium=email>









RE: XenServer & VM Snapshots

Posted by Matthew Hartmann <mh...@tls.net>.
Anthony:

Thank you for the prompt and informative reply.

> I'm pretty sure mount and copy are using the same XenServe host.

The behavior I have witnessed with CS 3.0.2 is that it doesn't always do the
mount & copy on the same host. Out of the 12 tests I've performed, only once
was the mount & copy performed on the same host that the VM was running on.

> I think the issue is the backup takes a long time because the data volume
is big and network rate is low.
> You can increase "BackupSnapshotWait" in global configuration table to let
the backup operation finish.

I increased this in global settings from the default of 9 hours to 16 hours.
The snapshot still doesn't complete on time; it on average copies about
~460G before it times out. I'm pretty confident the network rate isn't the
bottle neck as ISOs and imported VHDs install quickly. We have the Secondary
Storage server set as the only internal site allowed to host files. I upload
my ISO or VHD to Secondary Storage server and install using SSVM which
completes in a very timely manner. With a 1Gb network link, 1TB should copy
in roughly 2 hours (if the link is saturated by the copy process); I've only
found snapshotting (template creation appears to work flawlessly) to take an
insanely long time to complete.

Is there anything else I can do to increase performance or logs I should
check?

Cheers,

Matthew


Matthew Hartmann
Systems Administrator | V: 812.378.4100 x 850 | E: mhartmann@tls.net

TLS.NET, Inc.
http://www.tls.net


-----Original Message-----
From: Anthony Xu [mailto:Xuefei.Xu@citrix.com] 
Sent: Monday, December 03, 2012 12:31 PM
To: Cloudstack Users
Cc: Cloudstack Developers
Subject: RE: XenServer & VM Snapshots

Hi Matthew,

You analysis is correct except following,

>I must mention that the same Compute Node that ran sparse_dd or mounted
Secondary Storage is not always the same. It appears the Management Server
is simply round-robining through the list of >Compute Nodes and using the
first one that is available.

I'm pretty sure mount and copy are using the same XenServe host.

I think the issue is the backup takes a long time because the data volume is
big and network rate is low.
You can increase "BackupSnapshotWait" in global configuration table to let
the backup operation finish.


Since CS takes the advantage of XenServer image format VHD, it uses VHD to
do snapshot and clone, it requires snapshot to be backed up through
XenServer host.
The ideal solution for this issue might be leverage storage snapshot and
clone functionality, Then snapshot back up is executed by storage host,
relieve some of the limitation.
Currently CS doesn't support this,  it is not hard to support this after
Edison finishes storage frame change, it should be just another storage
plug-in.
When CS uses storage server snapshot and clone function, CS needs to
consider number of snapshot , number of volume limitation of storage server.


Anthony














From: Matthew Hartmann [mailto:mhartmann@tls.net]
Sent: Monday, December 03, 2012 9:08 AM
To: Cloudstack Users
Cc: Cloudstack Developers
Subject: XenServer & VM Snapshots

Hello! I'm hoping someone can help me troubleshoot the following issue:

I have a client who has a 960G data volume which contains their VM's
Exchange Data Store. When starting a snapshot, I found that a process is
started on one of my Compute Nodes titled "sparse_dd". I found that this
process is then sending the output of "sparse_dd" through another Compute
Node's xapi before placing it into the "snapshot store" on Secondary
Storage. It appears that this is part of the bottle neck as all of our
systems are connected via gigabit link and should not take 15+ hours to
create a snapshot. The following is the behavior that I have analyzed from
within my environment:


1)     Snapshot is started (either via Manual or Scheduled).

2)     Compute Node 1 "processes the snapshot" by exposing the VDI which
"sparse_dd" then creates a "thin provisioned" snapshot.

3)     The output of sparse_dd is delivered over HTTP to xapi on Compute
Node 2 where the Management Server mounted Secondary Storage.

4)     Compute Node 2 (receiving the snapshot via xapi) stores the snapshot
in the Secondary Storage mount point.

Based on the behavior, I have devise the following logic that I believe
CloudStack is utilizing:


1)     CloudStack creates a "snapshot VDI" via XenServer Pool Master's API.

2)     CloudStack finds a Compute Node that can mount Secondary Storage.

3)     CloudStack finds a Compute Node that can run "sparse_dd".

4)     CloudStack uses available Compute node to output the VDI to xapi on
the Compute Node that mounted Secondary Storage.

I must mention that the same Compute Node that ran sparse_dd or mounted
Secondary Storage is not always the same. It appears the Management Server
is simply round-robining through the list of Compute Nodes and using the
first one that is available.

Does anyone have any input on the issue I'm having or analysis of how
CloudStack/XenServer snapshots operate?

Thanks!

Cheers,

Matthew



Matthew Hartmann
Systems Administrator | V: 812.378.4100 x 850 | E: mhartmann@tls.net

[cid:image017.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/?utm_campaign=signat
ure&utm_source=home&utm_medium=email>

[cid:image018.jpg@01CDD14E.DBAA2E70]


[cid:image019.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/enterprise_cloud/clo
ud.php?utm_campaign=signature&utm_source=enterprise_cloud&utm_medium=email>

[cid:image020.jpg@01CDD14E.DBAA2E70]

[cid:image021.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/solutions/voip_servi
ces/hosted_pbx.php?utm_campaign=signature&utm_source=voip_services&utm_mediu
m=email>

[cid:image020.jpg@01CDD14E.DBAA2E70]

[cid:image022.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/solutions/network_en
gineering.php?utm_campaign=signature&utm_source=network_engineering&utm_medi
um=email>

[cid:image020.jpg@01CDD14E.DBAA2E70]

[cid:image023.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/data_centers/data_ce
nters.php?utm_campaign=signature&utm_source=data_centers&utm_medium=email>








RE: XenServer & VM Snapshots

Posted by Matthew Hartmann <mh...@tls.net>.
Anthony:

Thank you for the prompt and informative reply.

> I'm pretty sure mount and copy are using the same XenServe host.

The behavior I have witnessed with CS 3.0.2 is that it doesn't always do the
mount & copy on the same host. Out of the 12 tests I've performed, only once
was the mount & copy performed on the same host that the VM was running on.

> I think the issue is the backup takes a long time because the data volume
is big and network rate is low.
> You can increase "BackupSnapshotWait" in global configuration table to let
the backup operation finish.

I increased this in global settings from the default of 9 hours to 16 hours.
The snapshot still doesn't complete on time; it on average copies about
~460G before it times out. I'm pretty confident the network rate isn't the
bottle neck as ISOs and imported VHDs install quickly. We have the Secondary
Storage server set as the only internal site allowed to host files. I upload
my ISO or VHD to Secondary Storage server and install using SSVM which
completes in a very timely manner. With a 1Gb network link, 1TB should copy
in roughly 2 hours (if the link is saturated by the copy process); I've only
found snapshotting (template creation appears to work flawlessly) to take an
insanely long time to complete.

Is there anything else I can do to increase performance or logs I should
check?

Cheers,

Matthew


Matthew Hartmann
Systems Administrator | V: 812.378.4100 x 850 | E: mhartmann@tls.net

TLS.NET, Inc.
http://www.tls.net


-----Original Message-----
From: Anthony Xu [mailto:Xuefei.Xu@citrix.com] 
Sent: Monday, December 03, 2012 12:31 PM
To: Cloudstack Users
Cc: Cloudstack Developers
Subject: RE: XenServer & VM Snapshots

Hi Matthew,

You analysis is correct except following,

>I must mention that the same Compute Node that ran sparse_dd or mounted
Secondary Storage is not always the same. It appears the Management Server
is simply round-robining through the list of >Compute Nodes and using the
first one that is available.

I'm pretty sure mount and copy are using the same XenServe host.

I think the issue is the backup takes a long time because the data volume is
big and network rate is low.
You can increase "BackupSnapshotWait" in global configuration table to let
the backup operation finish.


Since CS takes the advantage of XenServer image format VHD, it uses VHD to
do snapshot and clone, it requires snapshot to be backed up through
XenServer host.
The ideal solution for this issue might be leverage storage snapshot and
clone functionality, Then snapshot back up is executed by storage host,
relieve some of the limitation.
Currently CS doesn't support this,  it is not hard to support this after
Edison finishes storage frame change, it should be just another storage
plug-in.
When CS uses storage server snapshot and clone function, CS needs to
consider number of snapshot , number of volume limitation of storage server.


Anthony














From: Matthew Hartmann [mailto:mhartmann@tls.net]
Sent: Monday, December 03, 2012 9:08 AM
To: Cloudstack Users
Cc: Cloudstack Developers
Subject: XenServer & VM Snapshots

Hello! I'm hoping someone can help me troubleshoot the following issue:

I have a client who has a 960G data volume which contains their VM's
Exchange Data Store. When starting a snapshot, I found that a process is
started on one of my Compute Nodes titled "sparse_dd". I found that this
process is then sending the output of "sparse_dd" through another Compute
Node's xapi before placing it into the "snapshot store" on Secondary
Storage. It appears that this is part of the bottle neck as all of our
systems are connected via gigabit link and should not take 15+ hours to
create a snapshot. The following is the behavior that I have analyzed from
within my environment:


1)     Snapshot is started (either via Manual or Scheduled).

2)     Compute Node 1 "processes the snapshot" by exposing the VDI which
"sparse_dd" then creates a "thin provisioned" snapshot.

3)     The output of sparse_dd is delivered over HTTP to xapi on Compute
Node 2 where the Management Server mounted Secondary Storage.

4)     Compute Node 2 (receiving the snapshot via xapi) stores the snapshot
in the Secondary Storage mount point.

Based on the behavior, I have devise the following logic that I believe
CloudStack is utilizing:


1)     CloudStack creates a "snapshot VDI" via XenServer Pool Master's API.

2)     CloudStack finds a Compute Node that can mount Secondary Storage.

3)     CloudStack finds a Compute Node that can run "sparse_dd".

4)     CloudStack uses available Compute node to output the VDI to xapi on
the Compute Node that mounted Secondary Storage.

I must mention that the same Compute Node that ran sparse_dd or mounted
Secondary Storage is not always the same. It appears the Management Server
is simply round-robining through the list of Compute Nodes and using the
first one that is available.

Does anyone have any input on the issue I'm having or analysis of how
CloudStack/XenServer snapshots operate?

Thanks!

Cheers,

Matthew



Matthew Hartmann
Systems Administrator | V: 812.378.4100 x 850 | E: mhartmann@tls.net

[cid:image017.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/?utm_campaign=signat
ure&utm_source=home&utm_medium=email>

[cid:image018.jpg@01CDD14E.DBAA2E70]


[cid:image019.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/enterprise_cloud/clo
ud.php?utm_campaign=signature&utm_source=enterprise_cloud&utm_medium=email>

[cid:image020.jpg@01CDD14E.DBAA2E70]

[cid:image021.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/solutions/voip_servi
ces/hosted_pbx.php?utm_campaign=signature&utm_source=voip_services&utm_mediu
m=email>

[cid:image020.jpg@01CDD14E.DBAA2E70]

[cid:image022.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/solutions/network_en
gineering.php?utm_campaign=signature&utm_source=network_engineering&utm_medi
um=email>

[cid:image020.jpg@01CDD14E.DBAA2E70]

[cid:image023.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/data_centers/data_ce
nters.php?utm_campaign=signature&utm_source=data_centers&utm_medium=email>








RE: XenServer & VM Snapshots

Posted by Anthony Xu <Xu...@citrix.com>.
Hi Matthew,

You analysis is correct except following,

>I must mention that the same Compute Node that ran sparse_dd or mounted Secondary Storage is not always the same. It appears the Management Server is simply round-robining through the list of >Compute Nodes and using the first one that is available.

I'm pretty sure mount and copy are using the same XenServe host.

I think the issue is the backup takes a long time because the data volume is big and network rate is low.
You can increase "BackupSnapshotWait" in global configuration table to let the backup operation finish.


Since CS takes the advantage of XenServer image format VHD, it uses VHD to do snapshot and clone, it requires snapshot to be backed up through XenServer host.
The ideal solution for this issue might be leverage storage snapshot and clone functionality, Then snapshot back up is executed by storage host, relieve some of the limitation.
Currently CS doesn't support this,  it is not hard to support this after Edison finishes storage frame change, it should be just another storage plug-in.
When CS uses storage server snapshot and clone function, CS needs to consider number of snapshot , number of volume limitation of storage server.


Anthony














From: Matthew Hartmann [mailto:mhartmann@tls.net]
Sent: Monday, December 03, 2012 9:08 AM
To: Cloudstack Users
Cc: Cloudstack Developers
Subject: XenServer & VM Snapshots

Hello! I'm hoping someone can help me troubleshoot the following issue:

I have a client who has a 960G data volume which contains their VM's Exchange Data Store. When starting a snapshot, I found that a process is started on one of my Compute Nodes titled "sparse_dd". I found that this process is then sending the output of "sparse_dd" through another Compute Node's xapi before placing it into the "snapshot store" on Secondary Storage. It appears that this is part of the bottle neck as all of our systems are connected via gigabit link and should not take 15+ hours to create a snapshot. The following is the behavior that I have analyzed from within my environment:


1)     Snapshot is started (either via Manual or Scheduled).

2)     Compute Node 1 "processes the snapshot" by exposing the VDI which "sparse_dd" then creates a "thin provisioned" snapshot.

3)     The output of sparse_dd is delivered over HTTP to xapi on Compute Node 2 where the Management Server mounted Secondary Storage.

4)     Compute Node 2 (receiving the snapshot via xapi) stores the snapshot in the Secondary Storage mount point.

Based on the behavior, I have devise the following logic that I believe CloudStack is utilizing:


1)     CloudStack creates a "snapshot VDI" via XenServer Pool Master's API.

2)     CloudStack finds a Compute Node that can mount Secondary Storage.

3)     CloudStack finds a Compute Node that can run "sparse_dd".

4)     CloudStack uses available Compute node to output the VDI to xapi on the Compute Node that mounted Secondary Storage.

I must mention that the same Compute Node that ran sparse_dd or mounted Secondary Storage is not always the same. It appears the Management Server is simply round-robining through the list of Compute Nodes and using the first one that is available.

Does anyone have any input on the issue I'm having or analysis of how CloudStack/XenServer snapshots operate?

Thanks!

Cheers,

Matthew



Matthew Hartmann
Systems Administrator | V: 812.378.4100 x 850 | E: mhartmann@tls.net

[cid:image017.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/?utm_campaign=signature&utm_source=home&utm_medium=email>

[cid:image018.jpg@01CDD14E.DBAA2E70]


[cid:image019.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/enterprise_cloud/cloud.php?utm_campaign=signature&utm_source=enterprise_cloud&utm_medium=email>

[cid:image020.jpg@01CDD14E.DBAA2E70]

[cid:image021.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/solutions/voip_services/hosted_pbx.php?utm_campaign=signature&utm_source=voip_services&utm_medium=email>

[cid:image020.jpg@01CDD14E.DBAA2E70]

[cid:image022.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/solutions/network_engineering.php?utm_campaign=signature&utm_source=network_engineering&utm_medium=email>

[cid:image020.jpg@01CDD14E.DBAA2E70]

[cid:image023.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/data_centers/data_centers.php?utm_campaign=signature&utm_source=data_centers&utm_medium=email>







RE: XenServer & VM Snapshots

Posted by Anthony Xu <Xu...@citrix.com>.
Hi Matthew,

You analysis is correct except following,

>I must mention that the same Compute Node that ran sparse_dd or mounted Secondary Storage is not always the same. It appears the Management Server is simply round-robining through the list of >Compute Nodes and using the first one that is available.

I'm pretty sure mount and copy are using the same XenServe host.

I think the issue is the backup takes a long time because the data volume is big and network rate is low.
You can increase "BackupSnapshotWait" in global configuration table to let the backup operation finish.


Since CS takes the advantage of XenServer image format VHD, it uses VHD to do snapshot and clone, it requires snapshot to be backed up through XenServer host.
The ideal solution for this issue might be leverage storage snapshot and clone functionality, Then snapshot back up is executed by storage host, relieve some of the limitation.
Currently CS doesn't support this,  it is not hard to support this after Edison finishes storage frame change, it should be just another storage plug-in.
When CS uses storage server snapshot and clone function, CS needs to consider number of snapshot , number of volume limitation of storage server.


Anthony














From: Matthew Hartmann [mailto:mhartmann@tls.net]
Sent: Monday, December 03, 2012 9:08 AM
To: Cloudstack Users
Cc: Cloudstack Developers
Subject: XenServer & VM Snapshots

Hello! I'm hoping someone can help me troubleshoot the following issue:

I have a client who has a 960G data volume which contains their VM's Exchange Data Store. When starting a snapshot, I found that a process is started on one of my Compute Nodes titled "sparse_dd". I found that this process is then sending the output of "sparse_dd" through another Compute Node's xapi before placing it into the "snapshot store" on Secondary Storage. It appears that this is part of the bottle neck as all of our systems are connected via gigabit link and should not take 15+ hours to create a snapshot. The following is the behavior that I have analyzed from within my environment:


1)     Snapshot is started (either via Manual or Scheduled).

2)     Compute Node 1 "processes the snapshot" by exposing the VDI which "sparse_dd" then creates a "thin provisioned" snapshot.

3)     The output of sparse_dd is delivered over HTTP to xapi on Compute Node 2 where the Management Server mounted Secondary Storage.

4)     Compute Node 2 (receiving the snapshot via xapi) stores the snapshot in the Secondary Storage mount point.

Based on the behavior, I have devise the following logic that I believe CloudStack is utilizing:


1)     CloudStack creates a "snapshot VDI" via XenServer Pool Master's API.

2)     CloudStack finds a Compute Node that can mount Secondary Storage.

3)     CloudStack finds a Compute Node that can run "sparse_dd".

4)     CloudStack uses available Compute node to output the VDI to xapi on the Compute Node that mounted Secondary Storage.

I must mention that the same Compute Node that ran sparse_dd or mounted Secondary Storage is not always the same. It appears the Management Server is simply round-robining through the list of Compute Nodes and using the first one that is available.

Does anyone have any input on the issue I'm having or analysis of how CloudStack/XenServer snapshots operate?

Thanks!

Cheers,

Matthew



Matthew Hartmann
Systems Administrator | V: 812.378.4100 x 850 | E: mhartmann@tls.net

[cid:image017.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/?utm_campaign=signature&utm_source=home&utm_medium=email>

[cid:image018.jpg@01CDD14E.DBAA2E70]


[cid:image019.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/enterprise_cloud/cloud.php?utm_campaign=signature&utm_source=enterprise_cloud&utm_medium=email>

[cid:image020.jpg@01CDD14E.DBAA2E70]

[cid:image021.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/solutions/voip_services/hosted_pbx.php?utm_campaign=signature&utm_source=voip_services&utm_medium=email>

[cid:image020.jpg@01CDD14E.DBAA2E70]

[cid:image022.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/solutions/network_engineering.php?utm_campaign=signature&utm_source=network_engineering&utm_medium=email>

[cid:image020.jpg@01CDD14E.DBAA2E70]

[cid:image023.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/data_centers/data_centers.php?utm_campaign=signature&utm_source=data_centers&utm_medium=email>