You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cloudstack.apache.org by Pawit Pornkitprasan <p....@gmail.com> on 2013/06/20 06:12:14 UTC

PCI-Passthrough with CloudStack (Improved)

Hi,

Following my previous post about implementing PCI Passthrough on
CloudStack (KVM), I have taken Edison Su’s and others’ comments into
account and came up with an improved design.

Because the devices available at each agent may be different, the
available devices for passthrough are now configured at the agent
configuration file (/etc/cloudstack/agent/agent.properties).
Configuration is a comma separated list of available PCI devices and
its given name.

pci.devices=28:00.1|10GE,28:00.2|10GE,28:00.3|10GE,28:00.4|10GE,28:00.5|10GE,28:00.6|10GE,28:00.7|10GE,28:01.0|10GE

At agent startup, the list of PCI devices is parsed and sent together
with StartupRoutingCommand (in a new field, not in details). The
management server then stores it in a new table “op_host_pci_devices”.
If a device is added, removed, or renamed, the table is updated
accordingly. The current schema has the following fields

id (auto-increment)
host_id (host that this device belongs to)
name (given name of the PCI device)
domain (PCI ID - domain)
bus (PCI ID - bus)
slot (PCI ID - slot)
function (PCI ID - function)
instance_id (ID of the VM using the PCI device, NULL if not in use)

The “name” of the PCI device is what is used to assign a device. In a
compute offering, the user can specify the name of one or more PCI
devices (as a comma-separated list) and CloudStack will find a host
with the PCI device of the specified name available and assign it.

A new manager, PciDeviceManager, is then created to handle the
allocation of PCI device. The manager implements StateListener and
assigns PCI devices on state change to “starting” and also release the
devices VM stop. First fit allocator and first fit planner are also
modified to check for PCI device availability accordingly.

For migration, there are 2 approaches. The first approach is to forbid
migration and is straightforward. The second approach is to PCI
Hotplug to detach the device, migrate and attach it again at the other
end. This will interrupt whatever is using the device on the VM.
However, it may be desirable for networking devices where the VM can
use a bonding device to channel network traffic through a standard
virtualized network device while the PCI Passthrough device is down.

The design mentioned here (including detach-attach migration) has been
implemented in code and is working. Again, comments and suggestions
are welcomed.

Best Regards,
Pawit

RE: PCI-Passthrough with CloudStack (Improved)

Posted by Paul Angus <pa...@shapeblue.com>.
Hi,

Will you be looking at/documenting the need to enable PCI pass-through by creating a customised kernel for KVM hosts?

[we've needed to change the DMAR flag to 'on' by default]


Regards,

Paul Angus
S: +44 20 3603 0540 | M: +447711418784
paul.angus@shapeblue.com

-----Original Message-----
From: Pawit Pornkitprasan [mailto:p.pawit@gmail.com]
Sent: 20 June 2013 05:38
To: dev@cloudstack.apache.org
Cc: Ryousei Takano; Pawit Pornkitprasan
Subject: Re: PCI-Passthrough with CloudStack (Improved)

Hi,

I've forgot to describe the detach-attach mechanism, so I'll describe it in this reply.

PCI Device detach is done automatically by the agent when a MigrateCommand is processed. In KVM case, the server gets a list of attached PCI devices from libvirt's XML and detach very device. If detach is not successful (i.e. the guest does not have the acpiphp module loaded), libvirt will error out and the migration will be canceled.

For re-attachment: Before the migration start, a PCI Device on the new host is assigned by PciDeviceManager and the one on the old host is freed (also on state change event). After the migration is successful, PciDeviceManager sends AttachPciDevicesCommand to the new agent with a list of the PCI IDs on the new host and the agent orders libvirt to attach it based on the command.

Best Regards,
Pawit

On Thu, Jun 20, 2013 at 1:12 PM, Pawit Pornkitprasan <p....@gmail.com> wrote:
> Hi,
>
> Following my previous post about implementing PCI Passthrough on
> CloudStack (KVM), I have taken Edison Su's and others' comments into
> account and came up with an improved design.
>
> Because the devices available at each agent may be different, the
> available devices for passthrough are now configured at the agent
> configuration file (/etc/cloudstack/agent/agent.properties).
> Configuration is a comma separated list of available PCI devices and
> its given name.
>
> pci.devices=28:00.1|10GE,28:00.2|10GE,28:00.3|10GE,28:00.4|10GE,28:00.
> 5|10GE,28:00.6|10GE,28:00.7|10GE,28:01.0|10GE
>
> At agent startup, the list of PCI devices is parsed and sent together
> with StartupRoutingCommand (in a new field, not in details). The
> management server then stores it in a new table "op_host_pci_devices".
> If a device is added, removed, or renamed, the table is updated
> accordingly. The current schema has the following fields
>
> id (auto-increment)
> host_id (host that this device belongs to) name (given name of the PCI
> device) domain (PCI ID - domain) bus (PCI ID - bus) slot (PCI ID -
> slot) function (PCI ID - function) instance_id (ID of the VM using the
> PCI device, NULL if not in use)
>
> The "name" of the PCI device is what is used to assign a device. In a
> compute offering, the user can specify the name of one or more PCI
> devices (as a comma-separated list) and CloudStack will find a host
> with the PCI device of the specified name available and assign it.
>
> A new manager, PciDeviceManager, is then created to handle the
> allocation of PCI device. The manager implements StateListener and
> assigns PCI devices on state change to "starting" and also release the
> devices VM stop. First fit allocator and first fit planner are also
> modified to check for PCI device availability accordingly.
>
> For migration, there are 2 approaches. The first approach is to forbid
> migration and is straightforward. The second approach is to PCI
> Hotplug to detach the device, migrate and attach it again at the other
> end. This will interrupt whatever is using the device on the VM.
> However, it may be desirable for networking devices where the VM can
> use a bonding device to channel network traffic through a standard
> virtualized network device while the PCI Passthrough device is down.
>
> The design mentioned here (including detach-attach migration) has been
> implemented in code and is working. Again, comments and suggestions
> are welcomed.
>
> Best Regards,
> Pawit

This email and any attachments to it may be confidential and are intended solely for the use of the individual to whom it is addressed. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Shape Blue Ltd or related companies. If you are not the intended recipient of this email, you must neither take any action based upon its contents, nor copy or show it to anyone. Please contact the sender if you believe you have received this email in error. Shape Blue Ltd is a company incorporated in England & Wales. ShapeBlue Services India LLP is operated under license from Shape Blue Ltd. ShapeBlue is a registered trademark.


Re: PCI-Passthrough with CloudStack (Improved)

Posted by Pawit Pornkitprasan <p....@gmail.com>.
Hi Paul,

I think that is more or less dependent on the hardware quirks. For my
case I only had to add intel_iommu=on to the kernel cmdline to get PCI
Passthrough working. An additional cmdline, pci=nocrs, was needed to
get SR-IOV mode of the Mellanox ConnectX-2 card working.

Best Regards,
Pawit

> Hi,
> Will you be looking at/documenting the need to enable PCI pass-through by creating a customised
> kernel for KVM hosts?
> [we've needed to change the DMAR flag to 'on' by default]
>
> Regards,
> Paul Angus
> S: +44 20 3603 0540 | M: +447711418784
> paul.angus@shapeblue.com

Re: PCI-Passthrough with CloudStack (Improved)

Posted by Pawit Pornkitprasan <p....@gmail.com>.
Hi,

I've forgot to describe the detach-attach mechanism, so I'll describe
it in this reply.

PCI Device detach is done automatically by the agent when a
MigrateCommand is processed. In KVM case, the server gets a list of
attached PCI devices from libvirt's XML and detach very device. If
detach is not successful (i.e. the guest does not have the acpiphp
module loaded), libvirt will error out and the migration will be
canceled.

For re-attachment: Before the migration start, a PCI Device on the new
host is assigned by PciDeviceManager and the one on the old host is
freed (also on state change event). After the migration is successful,
PciDeviceManager sends AttachPciDevicesCommand to the new agent with a
list of the PCI IDs on the new host and the agent orders libvirt to
attach it based on the command.

Best Regards,
Pawit

On Thu, Jun 20, 2013 at 1:12 PM, Pawit Pornkitprasan <p....@gmail.com> wrote:
> Hi,
>
> Following my previous post about implementing PCI Passthrough on
> CloudStack (KVM), I have taken Edison Su’s and others’ comments into
> account and came up with an improved design.
>
> Because the devices available at each agent may be different, the
> available devices for passthrough are now configured at the agent
> configuration file (/etc/cloudstack/agent/agent.properties).
> Configuration is a comma separated list of available PCI devices and
> its given name.
>
> pci.devices=28:00.1|10GE,28:00.2|10GE,28:00.3|10GE,28:00.4|10GE,28:00.5|10GE,28:00.6|10GE,28:00.7|10GE,28:01.0|10GE
>
> At agent startup, the list of PCI devices is parsed and sent together
> with StartupRoutingCommand (in a new field, not in details). The
> management server then stores it in a new table “op_host_pci_devices”.
> If a device is added, removed, or renamed, the table is updated
> accordingly. The current schema has the following fields
>
> id (auto-increment)
> host_id (host that this device belongs to)
> name (given name of the PCI device)
> domain (PCI ID - domain)
> bus (PCI ID - bus)
> slot (PCI ID - slot)
> function (PCI ID - function)
> instance_id (ID of the VM using the PCI device, NULL if not in use)
>
> The “name” of the PCI device is what is used to assign a device. In a
> compute offering, the user can specify the name of one or more PCI
> devices (as a comma-separated list) and CloudStack will find a host
> with the PCI device of the specified name available and assign it.
>
> A new manager, PciDeviceManager, is then created to handle the
> allocation of PCI device. The manager implements StateListener and
> assigns PCI devices on state change to “starting” and also release the
> devices VM stop. First fit allocator and first fit planner are also
> modified to check for PCI device availability accordingly.
>
> For migration, there are 2 approaches. The first approach is to forbid
> migration and is straightforward. The second approach is to PCI
> Hotplug to detach the device, migrate and attach it again at the other
> end. This will interrupt whatever is using the device on the VM.
> However, it may be desirable for networking devices where the VM can
> use a bonding device to channel network traffic through a standard
> virtualized network device while the PCI Passthrough device is down.
>
> The design mentioned here (including detach-attach migration) has been
> implemented in code and is working. Again, comments and suggestions
> are welcomed.
>
> Best Regards,
> Pawit