You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cloudstack.apache.org by Marcus Sorensen <sh...@gmail.com> on 2012/10/12 10:09:21 UTC

[RFC] QinQ vlans support

Guys, in looking for a free and scalable way to provide private networks
for customers I've been running a QinQ setup that has been working quite
well. I've sort of laid the groundwork for it already in changing the
bridge naming conventions about a month ago for KVM(to names that won't
collide if the same vlans is used twice on different phys).

Basically the way it works is like this. Linux has two ways of creating
tagged networks, the eth#.# and the less used vlan# network devices. I have
a tiny patch that causes cloudstack to treat vlan# devs as though they were
physical NICs. In this way, you can do something like physical devices
eth0,eth1,and vlan400. management traffic on eth0's bridge, storage on
eth1.102's bridge, maybe eth1.103 for public/guest, then create say a
vlan400 that is tag 400 on eth1. You add a traffic type of guest to it and
give it a vlan range, say 10-4000. Then you end up with cloudstack handing
out vlan400.10, vlan400.11, etc for guest networks. Works great for network
isolation without burning through a bunch of your "real" vlans. In the
unlikely event that you run out, you just create a physical vlan401 and
start over with the vlan numbers.

In theory all-you-can-eat isolated networks without having to configure
hundreds of vlans on your networking equipment. This may require additional
config on any upstream switches to pass the double tags around, but in
general from what I've seen the inner tags just pass through on anything
layer 2, it should only get tricky if you try to tunnel, route or strip
tags.

This is especially nice with system VM routers and VPC (cloudstack takes
care of everything), but admittedly external routers probably will have
spotty support for being able to route double tagged stuff. I'm also a bit
afraid that if I were to get it merged in that it would just become this
undocumented hack thing that few know about and nobody uses. So I'm looking
for feedback on whether this sounds useful enough to commit, how it should
be documented, and whether it makes sense to hint at this in the GUI
somehow.

Re: [RFC] QinQ vlans support

Posted by Marcus Sorensen <sh...@gmail.com>.
I've been thinking more about this q-in-q and how to make it
compatible with everyone. I think a simple modification to my existing
work would do the trick, some sort of configuration option in the form
of a list. Something like 'qinq-physdevs', and then I just do the same
thing but check against the list rather than relying on the "vlan"
name.

So for example someone could set up 'eth0.100' as a qinq-physdev, and
cloudstack would treat it like a physical device, create tagged vlans
on it, etc, just like my existing patch does.

The one question I have is in where to put it. I haven't spent much
time looking at configurations and what is appropriate to put where.
>From what I can tell there are three main locations for configuration
(probably all end up in the database), the first are the global config
options in the UI. The second would be the various configs set up
during zone creation, network creation, etc, also in the UI. The third
would be in the local config files such as what's in /etc/cloud.  I'd
lean toward putting this in /etc/cloud/agent.properties, since that's
where the bridge stuff lives as well, but if anyone has a suggestion
on where this should go it would be appreciated.

On Mon, Oct 22, 2012 at 2:25 PM, Marcus Sorensen <sh...@gmail.com> wrote:
> Here's my rough draft:
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Q-in-Q+for+isolated+networks+functional+spec
>
> On Sun, Oct 21, 2012 at 11:41 PM, Chiradeep Vittal
> <Ch...@citrix.com> wrote:
>> +1 on the FS.
>>
>> On 10/20/12 10:52 PM, "Marcus Sorensen" <sh...@gmail.com> wrote:
>>
>>>The admin does have to create a new physical network, the patch just
>>>allows you to use a tagged network as that physical network rather
>>>than a real eth device. It is true that cloudstack doesn't know about
>>>q-in-q per se, but it is the one creating the q-in-q vlans. The admin
>>>does have to create any "vlan#" devs to be used, but I think that
>>>makes sense since cloudstack doesn't manage any of your physical
>>>network devices. Perhaps I need to write a bit of a functional spec
>>>just to describe it in more detail.
>>>
>>>I haven't done anything with it in regards to xen, of course that
>>>would also be a different patch since it hits different code. If
>>>someone knows that code well maybe they can help. This is a simple
>>>patch, but it's made possible by a previous patch that reworks how the
>>>bridges are named, so enabling it for xen might not be as simple as
>>>this makes it look.
>>>
>>>On Sat, Oct 20, 2012 at 10:57 PM, Chiradeep Vittal
>>><Ch...@citrix.com> wrote:
>>>> It looks like your patch does not require the admin to configure
>>>>anything
>>>> wrt
>>>> physical networks. The admin knows the list of "outer" VLANs and
>>>> CloudStack is
>>>> blissfully unaware of the QinQ stuff.
>>>> This requires the hypervisors to be independently configured
>>>>(out-of-band)
>>>> with the
>>>> outer VLAN bridges ?
>>>> It also looks like this is a KVM-only solution.
>>>> Have you tried this with XS?
>>>>
>>>> On 10/18/12 6:21 PM, "Marcus Sorensen" <sh...@gmail.com> wrote:
>>>>
>>>>>Ah, well it's pretty simple, so I'll just paste it here. Again,
>>>>>perhaps more should be implemented regarding the MTU (like
>>>>>functionality to configure MTU on the virtual router), but if you know
>>>>>what to do it can all work via switch configs.
>>>>>
>>>>>diff --git
>>>>>a/plugins/hypervisors/kvm/src/com/cloud/hypervisor/kvm/resource/LibvirtC
>>>>>om
>>>>>putingResource.java
>>>>>b/plugins/hypervisors/kvm/src/com/cloud/hypervisor/kvm/resource/LibvirtC
>>>>>om
>>>>>putingResource.java
>>>>>index 1bc70fa..70de3db 100755
>>>>>---
>>>>>a/plugins/hypervisors/kvm/src/com/cloud/hypervisor/kvm/resource/LibvirtC
>>>>>om
>>>>>putingResource.java
>>>>>+++
>>>>>b/plugins/hypervisors/kvm/src/com/cloud/hypervisor/kvm/resource/LibvirtC
>>>>>om
>>>>>putingResource.java
>>>>>@@ -800,7 +800,7 @@ public class LibvirtComputingResource extends
>>>>>ServerResourceBase implements
>>>>>         String pif = Script.runSimpleBashScript("brctl show | grep "
>>>>>+ bridge + " | awk '{print $4}'");
>>>>>         String vlan = Script.runSimpleBashScript("ls /proc/net/vlan/" +
>>>>>pif);
>>>>>
>>>>>-        if (vlan != null && !vlan.isEmpty()) {
>>>>>+        if (vlan != null && !vlan.isEmpty() &&
>>>>>(!pif.startsWith("vlan") || pif.matches("vlan\\d+\\.\\d+"))) {
>>>>>                 pif = Script.runSimpleBashScript("grep ^Device\\:
>>>>>/proc/net/vlan/" + pif + " | awk {'print $2'}");
>>>>>         }
>>>>>
>>>>>On Thu, Oct 18, 2012 at 8:05 AM, Chip Childers
>>>>><ch...@sungard.com> wrote:
>>>>>> On Thu, Oct 18, 2012 at 12:42 AM, Marcus Sorensen
>>>>>><sh...@gmail.com>
>>>>>>wrote:
>>>>>>> Sorry, I've been up to my ears. I've attached the simple patch that
>>>>>>> makes this all happen, if anyone wants to take a look. This is the
>>>>>>> code that looks for physical devices. It's passed a bridge and then
>>>>>>> determines the parent of that bridge, then whether that parent is a
>>>>>>> tagged device and goes one more step and finds its parent. This just
>>>>>>> circumvents the last lookup if the parent of the bridge is a "vlan"
>>>>>>> device (single tagged, e.g. vlan100) but not a double-tagged one
>>>>>>>(e.g.
>>>>>>> vlan100.10), and the rest of cloudstack treats vlan100 as though it
>>>>>>> were a physical device, creates tagged bridges on it if it has guest
>>>>>>> traffic type, etc. I've been using it in our test bed for about a
>>>>>>> month, and have only run into the MTU issue.
>>>>>>
>>>>>> Hey Marcus,
>>>>>>
>>>>>> Attachments get stripped.  Can you post it somewhere?
>>>>>>
>>>>>>> If people still think it's a good idea, I'll create a functional spec
>>>>>>> and additional info on how it works.
>>>>>>>
>>>>>>>  I've also got a small patch to modifyvlans.sh, but I'm debating
>>>>>>> whether or not it's necessary. It detects whether the "physical
>>>>>>> interface" is actually a vlan tagged interface, and if so it
>>>>>>>subtracts
>>>>>>> the necessary bytes from the MTU when it sets up the double-tagged
>>>>>>> bridges. It's technically not necessary, as the important part is
>>>>>>> whether the guest MTUs fit inside the MTU that the switch allows once
>>>>>>> the extra tag is added. But it just makes it a bit more obvious as to
>>>>>>> what's needed. However it also breaks the admin's ability to bump the
>>>>>>> switch MTUs up just a bit, say 1532, to account for the excess
>>>>>>>without
>>>>>>> having to go up to 9000 or full jumbo. If anyone is a network guru
>>>>>>>and
>>>>>>> has any feedback it would be appreciated, but I'm inclined to leave
>>>>>>> the MTUs alone and write it into the functional spec that a switch
>>>>>>> with a 1500 MTU supports double tags up to 1468, and a switch with a
>>>>>>> 9000 MTU supports VM guest networks up to 8968 MTU.
>>>>>>>
>>>>>>> On Mon, Oct 15, 2012 at 1:43 PM, Marcus Sorensen
>>>>>>><sh...@gmail.com>
>>>>>>>wrote:
>>>>>>>> Ok, I'll pull out the changes and let people see them. Cloudstack
>>>>>>>> seems to let me put the same vlan ranges on multiple physicals,
>>>>>>>>though
>>>>>>>> I haven't done much actual testing with large numbers of vlans. I
>>>>>>>> imagine there would be other bottlenecks if they all needed to be up
>>>>>>>> on the same host at once. Luckily we only create bridges for the
>>>>>>>> actual VMs on the box so it should scale reasonably.
>>>>>>>>
>>>>>>>> The only caveat I've run into so far is that you either need to be
>>>>>>>> running jumbo frames on your switches, or turn down the MTU on the
>>>>>>>> guests a bit to accommodate the space taken by extra tag.  If you
>>>>>>>> wanted to run jumbo fames on the guests as well, you'd run into the
>>>>>>>> same situation and have to use slightly less than the 9000 (although
>>>>>>>> the virtual router would require a patch too for the new size).
>>>>>>>>
>>>>>>>> On Mon, Oct 15, 2012 at 9:56 AM, Ahmad Emneina
>>>>>>>><Ah...@citrix.com> wrote:
>>>>>>>>> On 10/15/12 8:35 AM, "Kelceydamage@bbits" <ke...@bbits.ca> wrote:
>>>>>>>>>
>>>>>>>>>>That's a far more elegant way then I tried, which was creating
>>>>>>>>>>tagged
>>>>>>>>>>interfaces within guests.
>>>>>>>>>>
>>>>>>>>>>Sent from my iPhone
>>>>>>>>>>
>>>>>>>>>>On Oct 15, 2012, at 12:54 AM, Chiradeep Vittal
>>>>>>>>>><Ch...@citrix.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> This sounds like it can be modeled as multiple physical networks?
>>>>>>>>>>>That
>>>>>>>>>>>is,
>>>>>>>>>>> each "outer" vlan (400, 401, etc) is a separate physical network
>>>>>>>>>>>in the
>>>>>>>>>>> same zone. That could work, although it is probable that the zone
>>>>>>>>>>> configuration API bits prevent more than 4k VLANs per zone (that
>>>>>>>>>>>can be
>>>>>>>>>>> changed to per physical network).
>>>>>>>>>>>
>>>>>>>>>>> As long as communication between guests on different physical
>>>>>>>>>>>networks
>>>>>>>>>>> happens via the public network, it should be Ok.
>>>>>>>>>>> I'd like to see the patch.
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>>
>>>>>>>>>>> On 10/12/12 1:09 AM, "Marcus Sorensen" <sh...@gmail.com>
>>>>>>>>>>>wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Guys, in looking for a free and scalable way to provide private
>>>>>>>>>>>>networks
>>>>>>>>>>>> for customers I've been running a QinQ setup that has been
>>>>>>>>>>>>working
>>>>>>>>>>>>quite
>>>>>>>>>>>> well. I've sort of laid the groundwork for it already in
>>>>>>>>>>>>changing
>>>>>>>>>>>>the
>>>>>>>>>>>> bridge naming conventions about a month ago for KVM(to names
>>>>>>>>>>>>that
>>>>>>>>>>>>won't
>>>>>>>>>>>> collide if the same vlans is used twice on different phys).
>>>>>>>>>>>>
>>>>>>>>>>>> Basically the way it works is like this. Linux has two ways of
>>>>>>>>>>>>creating
>>>>>>>>>>>> tagged networks, the eth#.# and the less used vlan# network
>>>>>>>>>>>>devices. I
>>>>>>>>>>>> have
>>>>>>>>>>>> a tiny patch that causes cloudstack to treat vlan# devs as
>>>>>>>>>>>>though
>>>>>>>>>>>>they
>>>>>>>>>>>> were
>>>>>>>>>>>> physical NICs. In this way, you can do something like physical
>>>>>>>>>>>>devices
>>>>>>>>>>>> eth0,eth1,and vlan400. management traffic on eth0's bridge,
>>>>>>>>>>>>storage on
>>>>>>>>>>>> eth1.102's bridge, maybe eth1.103 for public/guest, then create
>>>>>>>>>>>>say a
>>>>>>>>>>>> vlan400 that is tag 400 on eth1. You add a traffic type of guest
>>>>>>>>>>>>to it
>>>>>>>>>>>>and
>>>>>>>>>>>> give it a vlan range, say 10-4000. Then you end up with
>>>>>>>>>>>>cloudstack
>>>>>>>>>>>>handing
>>>>>>>>>>>> out vlan400.10, vlan400.11, etc for guest networks. Works great
>>>>>>>>>>>>for
>>>>>>>>>>>> network
>>>>>>>>>>>> isolation without burning through a bunch of your "real" vlans.
>>>>>>>>>>>>In the
>>>>>>>>>>>> unlikely event that you run out, you just create a physical
>>>>>>>>>>>>vlan401 and
>>>>>>>>>>>> start over with the vlan numbers.
>>>>>>>>>>>>
>>>>>>>>>>>> In theory all-you-can-eat isolated networks without having to
>>>>>>>>>>>>configure
>>>>>>>>>>>> hundreds of vlans on your networking equipment. This may require
>>>>>>>>>>>> additional
>>>>>>>>>>>> config on any upstream switches to pass the double tags around,
>>>>>>>>>>>>but in
>>>>>>>>>>>> general from what I've seen the inner tags just pass through on
>>>>>>>>>>>>anything
>>>>>>>>>>>> layer 2, it should only get tricky if you try to tunnel, route
>>>>>>>>>>>>or
>>>>>>>>>>>>strip
>>>>>>>>>>>> tags.
>>>>>>>>>>>>
>>>>>>>>>>>> This is especially nice with system VM routers and VPC
>>>>>>>>>>>>(cloudstack
>>>>>>>>>>>>takes
>>>>>>>>>>>> care of everything), but admittedly external routers probably
>>>>>>>>>>>>will have
>>>>>>>>>>>> spotty support for being able to route double tagged stuff. I'm
>>>>>>>>>>>>also a
>>>>>>>>>>>>bit
>>>>>>>>>>>> afraid that if I were to get it merged in that it would just
>>>>>>>>>>>>become
>>>>>>>>>>>>this
>>>>>>>>>>>> undocumented hack thing that few know about and nobody uses. So
>>>>>>>>>>>>I'm
>>>>>>>>>>>> looking
>>>>>>>>>>>> for feedback on whether this sounds useful enough to commit, how
>>>>>>>>>>>>it
>>>>>>>>>>>>should
>>>>>>>>>>>> be documented, and whether it makes sense to hint at this in the
>>>>>>>>>>>>GUI
>>>>>>>>>>>> somehow.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> +1
>>>>>>>>>
>>>>>>>>> This actually sounds amazing Marcus. I'd love to see and use this
>>>>>>>>> implementation.
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Æ
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>
>>

Re: [RFC] QinQ vlans support

Posted by Marcus Sorensen <sh...@gmail.com>.
Here's my rough draft:
https://cwiki.apache.org/confluence/display/CLOUDSTACK/Q-in-Q+for+isolated+networks+functional+spec

On Sun, Oct 21, 2012 at 11:41 PM, Chiradeep Vittal
<Ch...@citrix.com> wrote:
> +1 on the FS.
>
> On 10/20/12 10:52 PM, "Marcus Sorensen" <sh...@gmail.com> wrote:
>
>>The admin does have to create a new physical network, the patch just
>>allows you to use a tagged network as that physical network rather
>>than a real eth device. It is true that cloudstack doesn't know about
>>q-in-q per se, but it is the one creating the q-in-q vlans. The admin
>>does have to create any "vlan#" devs to be used, but I think that
>>makes sense since cloudstack doesn't manage any of your physical
>>network devices. Perhaps I need to write a bit of a functional spec
>>just to describe it in more detail.
>>
>>I haven't done anything with it in regards to xen, of course that
>>would also be a different patch since it hits different code. If
>>someone knows that code well maybe they can help. This is a simple
>>patch, but it's made possible by a previous patch that reworks how the
>>bridges are named, so enabling it for xen might not be as simple as
>>this makes it look.
>>
>>On Sat, Oct 20, 2012 at 10:57 PM, Chiradeep Vittal
>><Ch...@citrix.com> wrote:
>>> It looks like your patch does not require the admin to configure
>>>anything
>>> wrt
>>> physical networks. The admin knows the list of "outer" VLANs and
>>> CloudStack is
>>> blissfully unaware of the QinQ stuff.
>>> This requires the hypervisors to be independently configured
>>>(out-of-band)
>>> with the
>>> outer VLAN bridges ?
>>> It also looks like this is a KVM-only solution.
>>> Have you tried this with XS?
>>>
>>> On 10/18/12 6:21 PM, "Marcus Sorensen" <sh...@gmail.com> wrote:
>>>
>>>>Ah, well it's pretty simple, so I'll just paste it here. Again,
>>>>perhaps more should be implemented regarding the MTU (like
>>>>functionality to configure MTU on the virtual router), but if you know
>>>>what to do it can all work via switch configs.
>>>>
>>>>diff --git
>>>>a/plugins/hypervisors/kvm/src/com/cloud/hypervisor/kvm/resource/LibvirtC
>>>>om
>>>>putingResource.java
>>>>b/plugins/hypervisors/kvm/src/com/cloud/hypervisor/kvm/resource/LibvirtC
>>>>om
>>>>putingResource.java
>>>>index 1bc70fa..70de3db 100755
>>>>---
>>>>a/plugins/hypervisors/kvm/src/com/cloud/hypervisor/kvm/resource/LibvirtC
>>>>om
>>>>putingResource.java
>>>>+++
>>>>b/plugins/hypervisors/kvm/src/com/cloud/hypervisor/kvm/resource/LibvirtC
>>>>om
>>>>putingResource.java
>>>>@@ -800,7 +800,7 @@ public class LibvirtComputingResource extends
>>>>ServerResourceBase implements
>>>>         String pif = Script.runSimpleBashScript("brctl show | grep "
>>>>+ bridge + " | awk '{print $4}'");
>>>>         String vlan = Script.runSimpleBashScript("ls /proc/net/vlan/" +
>>>>pif);
>>>>
>>>>-        if (vlan != null && !vlan.isEmpty()) {
>>>>+        if (vlan != null && !vlan.isEmpty() &&
>>>>(!pif.startsWith("vlan") || pif.matches("vlan\\d+\\.\\d+"))) {
>>>>                 pif = Script.runSimpleBashScript("grep ^Device\\:
>>>>/proc/net/vlan/" + pif + " | awk {'print $2'}");
>>>>         }
>>>>
>>>>On Thu, Oct 18, 2012 at 8:05 AM, Chip Childers
>>>><ch...@sungard.com> wrote:
>>>>> On Thu, Oct 18, 2012 at 12:42 AM, Marcus Sorensen
>>>>><sh...@gmail.com>
>>>>>wrote:
>>>>>> Sorry, I've been up to my ears. I've attached the simple patch that
>>>>>> makes this all happen, if anyone wants to take a look. This is the
>>>>>> code that looks for physical devices. It's passed a bridge and then
>>>>>> determines the parent of that bridge, then whether that parent is a
>>>>>> tagged device and goes one more step and finds its parent. This just
>>>>>> circumvents the last lookup if the parent of the bridge is a "vlan"
>>>>>> device (single tagged, e.g. vlan100) but not a double-tagged one
>>>>>>(e.g.
>>>>>> vlan100.10), and the rest of cloudstack treats vlan100 as though it
>>>>>> were a physical device, creates tagged bridges on it if it has guest
>>>>>> traffic type, etc. I've been using it in our test bed for about a
>>>>>> month, and have only run into the MTU issue.
>>>>>
>>>>> Hey Marcus,
>>>>>
>>>>> Attachments get stripped.  Can you post it somewhere?
>>>>>
>>>>>> If people still think it's a good idea, I'll create a functional spec
>>>>>> and additional info on how it works.
>>>>>>
>>>>>>  I've also got a small patch to modifyvlans.sh, but I'm debating
>>>>>> whether or not it's necessary. It detects whether the "physical
>>>>>> interface" is actually a vlan tagged interface, and if so it
>>>>>>subtracts
>>>>>> the necessary bytes from the MTU when it sets up the double-tagged
>>>>>> bridges. It's technically not necessary, as the important part is
>>>>>> whether the guest MTUs fit inside the MTU that the switch allows once
>>>>>> the extra tag is added. But it just makes it a bit more obvious as to
>>>>>> what's needed. However it also breaks the admin's ability to bump the
>>>>>> switch MTUs up just a bit, say 1532, to account for the excess
>>>>>>without
>>>>>> having to go up to 9000 or full jumbo. If anyone is a network guru
>>>>>>and
>>>>>> has any feedback it would be appreciated, but I'm inclined to leave
>>>>>> the MTUs alone and write it into the functional spec that a switch
>>>>>> with a 1500 MTU supports double tags up to 1468, and a switch with a
>>>>>> 9000 MTU supports VM guest networks up to 8968 MTU.
>>>>>>
>>>>>> On Mon, Oct 15, 2012 at 1:43 PM, Marcus Sorensen
>>>>>><sh...@gmail.com>
>>>>>>wrote:
>>>>>>> Ok, I'll pull out the changes and let people see them. Cloudstack
>>>>>>> seems to let me put the same vlan ranges on multiple physicals,
>>>>>>>though
>>>>>>> I haven't done much actual testing with large numbers of vlans. I
>>>>>>> imagine there would be other bottlenecks if they all needed to be up
>>>>>>> on the same host at once. Luckily we only create bridges for the
>>>>>>> actual VMs on the box so it should scale reasonably.
>>>>>>>
>>>>>>> The only caveat I've run into so far is that you either need to be
>>>>>>> running jumbo frames on your switches, or turn down the MTU on the
>>>>>>> guests a bit to accommodate the space taken by extra tag.  If you
>>>>>>> wanted to run jumbo fames on the guests as well, you'd run into the
>>>>>>> same situation and have to use slightly less than the 9000 (although
>>>>>>> the virtual router would require a patch too for the new size).
>>>>>>>
>>>>>>> On Mon, Oct 15, 2012 at 9:56 AM, Ahmad Emneina
>>>>>>><Ah...@citrix.com> wrote:
>>>>>>>> On 10/15/12 8:35 AM, "Kelceydamage@bbits" <ke...@bbits.ca> wrote:
>>>>>>>>
>>>>>>>>>That's a far more elegant way then I tried, which was creating
>>>>>>>>>tagged
>>>>>>>>>interfaces within guests.
>>>>>>>>>
>>>>>>>>>Sent from my iPhone
>>>>>>>>>
>>>>>>>>>On Oct 15, 2012, at 12:54 AM, Chiradeep Vittal
>>>>>>>>><Ch...@citrix.com> wrote:
>>>>>>>>>
>>>>>>>>>> This sounds like it can be modeled as multiple physical networks?
>>>>>>>>>>That
>>>>>>>>>>is,
>>>>>>>>>> each "outer" vlan (400, 401, etc) is a separate physical network
>>>>>>>>>>in the
>>>>>>>>>> same zone. That could work, although it is probable that the zone
>>>>>>>>>> configuration API bits prevent more than 4k VLANs per zone (that
>>>>>>>>>>can be
>>>>>>>>>> changed to per physical network).
>>>>>>>>>>
>>>>>>>>>> As long as communication between guests on different physical
>>>>>>>>>>networks
>>>>>>>>>> happens via the public network, it should be Ok.
>>>>>>>>>> I'd like to see the patch.
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>>
>>>>>>>>>> On 10/12/12 1:09 AM, "Marcus Sorensen" <sh...@gmail.com>
>>>>>>>>>>wrote:
>>>>>>>>>>
>>>>>>>>>>> Guys, in looking for a free and scalable way to provide private
>>>>>>>>>>>networks
>>>>>>>>>>> for customers I've been running a QinQ setup that has been
>>>>>>>>>>>working
>>>>>>>>>>>quite
>>>>>>>>>>> well. I've sort of laid the groundwork for it already in
>>>>>>>>>>>changing
>>>>>>>>>>>the
>>>>>>>>>>> bridge naming conventions about a month ago for KVM(to names
>>>>>>>>>>>that
>>>>>>>>>>>won't
>>>>>>>>>>> collide if the same vlans is used twice on different phys).
>>>>>>>>>>>
>>>>>>>>>>> Basically the way it works is like this. Linux has two ways of
>>>>>>>>>>>creating
>>>>>>>>>>> tagged networks, the eth#.# and the less used vlan# network
>>>>>>>>>>>devices. I
>>>>>>>>>>> have
>>>>>>>>>>> a tiny patch that causes cloudstack to treat vlan# devs as
>>>>>>>>>>>though
>>>>>>>>>>>they
>>>>>>>>>>> were
>>>>>>>>>>> physical NICs. In this way, you can do something like physical
>>>>>>>>>>>devices
>>>>>>>>>>> eth0,eth1,and vlan400. management traffic on eth0's bridge,
>>>>>>>>>>>storage on
>>>>>>>>>>> eth1.102's bridge, maybe eth1.103 for public/guest, then create
>>>>>>>>>>>say a
>>>>>>>>>>> vlan400 that is tag 400 on eth1. You add a traffic type of guest
>>>>>>>>>>>to it
>>>>>>>>>>>and
>>>>>>>>>>> give it a vlan range, say 10-4000. Then you end up with
>>>>>>>>>>>cloudstack
>>>>>>>>>>>handing
>>>>>>>>>>> out vlan400.10, vlan400.11, etc for guest networks. Works great
>>>>>>>>>>>for
>>>>>>>>>>> network
>>>>>>>>>>> isolation without burning through a bunch of your "real" vlans.
>>>>>>>>>>>In the
>>>>>>>>>>> unlikely event that you run out, you just create a physical
>>>>>>>>>>>vlan401 and
>>>>>>>>>>> start over with the vlan numbers.
>>>>>>>>>>>
>>>>>>>>>>> In theory all-you-can-eat isolated networks without having to
>>>>>>>>>>>configure
>>>>>>>>>>> hundreds of vlans on your networking equipment. This may require
>>>>>>>>>>> additional
>>>>>>>>>>> config on any upstream switches to pass the double tags around,
>>>>>>>>>>>but in
>>>>>>>>>>> general from what I've seen the inner tags just pass through on
>>>>>>>>>>>anything
>>>>>>>>>>> layer 2, it should only get tricky if you try to tunnel, route
>>>>>>>>>>>or
>>>>>>>>>>>strip
>>>>>>>>>>> tags.
>>>>>>>>>>>
>>>>>>>>>>> This is especially nice with system VM routers and VPC
>>>>>>>>>>>(cloudstack
>>>>>>>>>>>takes
>>>>>>>>>>> care of everything), but admittedly external routers probably
>>>>>>>>>>>will have
>>>>>>>>>>> spotty support for being able to route double tagged stuff. I'm
>>>>>>>>>>>also a
>>>>>>>>>>>bit
>>>>>>>>>>> afraid that if I were to get it merged in that it would just
>>>>>>>>>>>become
>>>>>>>>>>>this
>>>>>>>>>>> undocumented hack thing that few know about and nobody uses. So
>>>>>>>>>>>I'm
>>>>>>>>>>> looking
>>>>>>>>>>> for feedback on whether this sounds useful enough to commit, how
>>>>>>>>>>>it
>>>>>>>>>>>should
>>>>>>>>>>> be documented, and whether it makes sense to hint at this in the
>>>>>>>>>>>GUI
>>>>>>>>>>> somehow.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> +1
>>>>>>>>
>>>>>>>> This actually sounds amazing Marcus. I'd love to see and use this
>>>>>>>> implementation.
>>>>>>>>
>>>>>>>> --
>>>>>>>> Æ
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>
>

Re: [RFC] QinQ vlans support

Posted by Chiradeep Vittal <Ch...@citrix.com>.
+1 on the FS.

On 10/20/12 10:52 PM, "Marcus Sorensen" <sh...@gmail.com> wrote:

>The admin does have to create a new physical network, the patch just
>allows you to use a tagged network as that physical network rather
>than a real eth device. It is true that cloudstack doesn't know about
>q-in-q per se, but it is the one creating the q-in-q vlans. The admin
>does have to create any "vlan#" devs to be used, but I think that
>makes sense since cloudstack doesn't manage any of your physical
>network devices. Perhaps I need to write a bit of a functional spec
>just to describe it in more detail.
>
>I haven't done anything with it in regards to xen, of course that
>would also be a different patch since it hits different code. If
>someone knows that code well maybe they can help. This is a simple
>patch, but it's made possible by a previous patch that reworks how the
>bridges are named, so enabling it for xen might not be as simple as
>this makes it look.
>
>On Sat, Oct 20, 2012 at 10:57 PM, Chiradeep Vittal
><Ch...@citrix.com> wrote:
>> It looks like your patch does not require the admin to configure
>>anything
>> wrt
>> physical networks. The admin knows the list of "outer" VLANs and
>> CloudStack is
>> blissfully unaware of the QinQ stuff.
>> This requires the hypervisors to be independently configured
>>(out-of-band)
>> with the
>> outer VLAN bridges ?
>> It also looks like this is a KVM-only solution.
>> Have you tried this with XS?
>>
>> On 10/18/12 6:21 PM, "Marcus Sorensen" <sh...@gmail.com> wrote:
>>
>>>Ah, well it's pretty simple, so I'll just paste it here. Again,
>>>perhaps more should be implemented regarding the MTU (like
>>>functionality to configure MTU on the virtual router), but if you know
>>>what to do it can all work via switch configs.
>>>
>>>diff --git
>>>a/plugins/hypervisors/kvm/src/com/cloud/hypervisor/kvm/resource/LibvirtC
>>>om
>>>putingResource.java
>>>b/plugins/hypervisors/kvm/src/com/cloud/hypervisor/kvm/resource/LibvirtC
>>>om
>>>putingResource.java
>>>index 1bc70fa..70de3db 100755
>>>---
>>>a/plugins/hypervisors/kvm/src/com/cloud/hypervisor/kvm/resource/LibvirtC
>>>om
>>>putingResource.java
>>>+++
>>>b/plugins/hypervisors/kvm/src/com/cloud/hypervisor/kvm/resource/LibvirtC
>>>om
>>>putingResource.java
>>>@@ -800,7 +800,7 @@ public class LibvirtComputingResource extends
>>>ServerResourceBase implements
>>>         String pif = Script.runSimpleBashScript("brctl show | grep "
>>>+ bridge + " | awk '{print $4}'");
>>>         String vlan = Script.runSimpleBashScript("ls /proc/net/vlan/" +
>>>pif);
>>>
>>>-        if (vlan != null && !vlan.isEmpty()) {
>>>+        if (vlan != null && !vlan.isEmpty() &&
>>>(!pif.startsWith("vlan") || pif.matches("vlan\\d+\\.\\d+"))) {
>>>                 pif = Script.runSimpleBashScript("grep ^Device\\:
>>>/proc/net/vlan/" + pif + " | awk {'print $2'}");
>>>         }
>>>
>>>On Thu, Oct 18, 2012 at 8:05 AM, Chip Childers
>>><ch...@sungard.com> wrote:
>>>> On Thu, Oct 18, 2012 at 12:42 AM, Marcus Sorensen
>>>><sh...@gmail.com>
>>>>wrote:
>>>>> Sorry, I've been up to my ears. I've attached the simple patch that
>>>>> makes this all happen, if anyone wants to take a look. This is the
>>>>> code that looks for physical devices. It's passed a bridge and then
>>>>> determines the parent of that bridge, then whether that parent is a
>>>>> tagged device and goes one more step and finds its parent. This just
>>>>> circumvents the last lookup if the parent of the bridge is a "vlan"
>>>>> device (single tagged, e.g. vlan100) but not a double-tagged one
>>>>>(e.g.
>>>>> vlan100.10), and the rest of cloudstack treats vlan100 as though it
>>>>> were a physical device, creates tagged bridges on it if it has guest
>>>>> traffic type, etc. I've been using it in our test bed for about a
>>>>> month, and have only run into the MTU issue.
>>>>
>>>> Hey Marcus,
>>>>
>>>> Attachments get stripped.  Can you post it somewhere?
>>>>
>>>>> If people still think it's a good idea, I'll create a functional spec
>>>>> and additional info on how it works.
>>>>>
>>>>>  I've also got a small patch to modifyvlans.sh, but I'm debating
>>>>> whether or not it's necessary. It detects whether the "physical
>>>>> interface" is actually a vlan tagged interface, and if so it
>>>>>subtracts
>>>>> the necessary bytes from the MTU when it sets up the double-tagged
>>>>> bridges. It's technically not necessary, as the important part is
>>>>> whether the guest MTUs fit inside the MTU that the switch allows once
>>>>> the extra tag is added. But it just makes it a bit more obvious as to
>>>>> what's needed. However it also breaks the admin's ability to bump the
>>>>> switch MTUs up just a bit, say 1532, to account for the excess
>>>>>without
>>>>> having to go up to 9000 or full jumbo. If anyone is a network guru
>>>>>and
>>>>> has any feedback it would be appreciated, but I'm inclined to leave
>>>>> the MTUs alone and write it into the functional spec that a switch
>>>>> with a 1500 MTU supports double tags up to 1468, and a switch with a
>>>>> 9000 MTU supports VM guest networks up to 8968 MTU.
>>>>>
>>>>> On Mon, Oct 15, 2012 at 1:43 PM, Marcus Sorensen
>>>>><sh...@gmail.com>
>>>>>wrote:
>>>>>> Ok, I'll pull out the changes and let people see them. Cloudstack
>>>>>> seems to let me put the same vlan ranges on multiple physicals,
>>>>>>though
>>>>>> I haven't done much actual testing with large numbers of vlans. I
>>>>>> imagine there would be other bottlenecks if they all needed to be up
>>>>>> on the same host at once. Luckily we only create bridges for the
>>>>>> actual VMs on the box so it should scale reasonably.
>>>>>>
>>>>>> The only caveat I've run into so far is that you either need to be
>>>>>> running jumbo frames on your switches, or turn down the MTU on the
>>>>>> guests a bit to accommodate the space taken by extra tag.  If you
>>>>>> wanted to run jumbo fames on the guests as well, you'd run into the
>>>>>> same situation and have to use slightly less than the 9000 (although
>>>>>> the virtual router would require a patch too for the new size).
>>>>>>
>>>>>> On Mon, Oct 15, 2012 at 9:56 AM, Ahmad Emneina
>>>>>><Ah...@citrix.com> wrote:
>>>>>>> On 10/15/12 8:35 AM, "Kelceydamage@bbits" <ke...@bbits.ca> wrote:
>>>>>>>
>>>>>>>>That's a far more elegant way then I tried, which was creating
>>>>>>>>tagged
>>>>>>>>interfaces within guests.
>>>>>>>>
>>>>>>>>Sent from my iPhone
>>>>>>>>
>>>>>>>>On Oct 15, 2012, at 12:54 AM, Chiradeep Vittal
>>>>>>>><Ch...@citrix.com> wrote:
>>>>>>>>
>>>>>>>>> This sounds like it can be modeled as multiple physical networks?
>>>>>>>>>That
>>>>>>>>>is,
>>>>>>>>> each "outer" vlan (400, 401, etc) is a separate physical network
>>>>>>>>>in the
>>>>>>>>> same zone. That could work, although it is probable that the zone
>>>>>>>>> configuration API bits prevent more than 4k VLANs per zone (that
>>>>>>>>>can be
>>>>>>>>> changed to per physical network).
>>>>>>>>>
>>>>>>>>> As long as communication between guests on different physical
>>>>>>>>>networks
>>>>>>>>> happens via the public network, it should be Ok.
>>>>>>>>> I'd like to see the patch.
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>> On 10/12/12 1:09 AM, "Marcus Sorensen" <sh...@gmail.com>
>>>>>>>>>wrote:
>>>>>>>>>
>>>>>>>>>> Guys, in looking for a free and scalable way to provide private
>>>>>>>>>>networks
>>>>>>>>>> for customers I've been running a QinQ setup that has been
>>>>>>>>>>working
>>>>>>>>>>quite
>>>>>>>>>> well. I've sort of laid the groundwork for it already in
>>>>>>>>>>changing
>>>>>>>>>>the
>>>>>>>>>> bridge naming conventions about a month ago for KVM(to names
>>>>>>>>>>that
>>>>>>>>>>won't
>>>>>>>>>> collide if the same vlans is used twice on different phys).
>>>>>>>>>>
>>>>>>>>>> Basically the way it works is like this. Linux has two ways of
>>>>>>>>>>creating
>>>>>>>>>> tagged networks, the eth#.# and the less used vlan# network
>>>>>>>>>>devices. I
>>>>>>>>>> have
>>>>>>>>>> a tiny patch that causes cloudstack to treat vlan# devs as
>>>>>>>>>>though
>>>>>>>>>>they
>>>>>>>>>> were
>>>>>>>>>> physical NICs. In this way, you can do something like physical
>>>>>>>>>>devices
>>>>>>>>>> eth0,eth1,and vlan400. management traffic on eth0's bridge,
>>>>>>>>>>storage on
>>>>>>>>>> eth1.102's bridge, maybe eth1.103 for public/guest, then create
>>>>>>>>>>say a
>>>>>>>>>> vlan400 that is tag 400 on eth1. You add a traffic type of guest
>>>>>>>>>>to it
>>>>>>>>>>and
>>>>>>>>>> give it a vlan range, say 10-4000. Then you end up with
>>>>>>>>>>cloudstack
>>>>>>>>>>handing
>>>>>>>>>> out vlan400.10, vlan400.11, etc for guest networks. Works great
>>>>>>>>>>for
>>>>>>>>>> network
>>>>>>>>>> isolation without burning through a bunch of your "real" vlans.
>>>>>>>>>>In the
>>>>>>>>>> unlikely event that you run out, you just create a physical
>>>>>>>>>>vlan401 and
>>>>>>>>>> start over with the vlan numbers.
>>>>>>>>>>
>>>>>>>>>> In theory all-you-can-eat isolated networks without having to
>>>>>>>>>>configure
>>>>>>>>>> hundreds of vlans on your networking equipment. This may require
>>>>>>>>>> additional
>>>>>>>>>> config on any upstream switches to pass the double tags around,
>>>>>>>>>>but in
>>>>>>>>>> general from what I've seen the inner tags just pass through on
>>>>>>>>>>anything
>>>>>>>>>> layer 2, it should only get tricky if you try to tunnel, route
>>>>>>>>>>or
>>>>>>>>>>strip
>>>>>>>>>> tags.
>>>>>>>>>>
>>>>>>>>>> This is especially nice with system VM routers and VPC
>>>>>>>>>>(cloudstack
>>>>>>>>>>takes
>>>>>>>>>> care of everything), but admittedly external routers probably
>>>>>>>>>>will have
>>>>>>>>>> spotty support for being able to route double tagged stuff. I'm
>>>>>>>>>>also a
>>>>>>>>>>bit
>>>>>>>>>> afraid that if I were to get it merged in that it would just
>>>>>>>>>>become
>>>>>>>>>>this
>>>>>>>>>> undocumented hack thing that few know about and nobody uses. So
>>>>>>>>>>I'm
>>>>>>>>>> looking
>>>>>>>>>> for feedback on whether this sounds useful enough to commit, how
>>>>>>>>>>it
>>>>>>>>>>should
>>>>>>>>>> be documented, and whether it makes sense to hint at this in the
>>>>>>>>>>GUI
>>>>>>>>>> somehow.
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> +1
>>>>>>>
>>>>>>> This actually sounds amazing Marcus. I'd love to see and use this
>>>>>>> implementation.
>>>>>>>
>>>>>>> --
>>>>>>> Æ
>>>>>>>
>>>>>>>
>>>>>>>
>>


Re: [RFC] QinQ vlans support

Posted by Marcus Sorensen <sh...@gmail.com>.
The admin does have to create a new physical network, the patch just
allows you to use a tagged network as that physical network rather
than a real eth device. It is true that cloudstack doesn't know about
q-in-q per se, but it is the one creating the q-in-q vlans. The admin
does have to create any "vlan#" devs to be used, but I think that
makes sense since cloudstack doesn't manage any of your physical
network devices. Perhaps I need to write a bit of a functional spec
just to describe it in more detail.

I haven't done anything with it in regards to xen, of course that
would also be a different patch since it hits different code. If
someone knows that code well maybe they can help. This is a simple
patch, but it's made possible by a previous patch that reworks how the
bridges are named, so enabling it for xen might not be as simple as
this makes it look.

On Sat, Oct 20, 2012 at 10:57 PM, Chiradeep Vittal
<Ch...@citrix.com> wrote:
> It looks like your patch does not require the admin to configure anything
> wrt
> physical networks. The admin knows the list of "outer" VLANs and
> CloudStack is
> blissfully unaware of the QinQ stuff.
> This requires the hypervisors to be independently configured (out-of-band)
> with the
> outer VLAN bridges ?
> It also looks like this is a KVM-only solution.
> Have you tried this with XS?
>
> On 10/18/12 6:21 PM, "Marcus Sorensen" <sh...@gmail.com> wrote:
>
>>Ah, well it's pretty simple, so I'll just paste it here. Again,
>>perhaps more should be implemented regarding the MTU (like
>>functionality to configure MTU on the virtual router), but if you know
>>what to do it can all work via switch configs.
>>
>>diff --git
>>a/plugins/hypervisors/kvm/src/com/cloud/hypervisor/kvm/resource/LibvirtCom
>>putingResource.java
>>b/plugins/hypervisors/kvm/src/com/cloud/hypervisor/kvm/resource/LibvirtCom
>>putingResource.java
>>index 1bc70fa..70de3db 100755
>>---
>>a/plugins/hypervisors/kvm/src/com/cloud/hypervisor/kvm/resource/LibvirtCom
>>putingResource.java
>>+++
>>b/plugins/hypervisors/kvm/src/com/cloud/hypervisor/kvm/resource/LibvirtCom
>>putingResource.java
>>@@ -800,7 +800,7 @@ public class LibvirtComputingResource extends
>>ServerResourceBase implements
>>         String pif = Script.runSimpleBashScript("brctl show | grep "
>>+ bridge + " | awk '{print $4}'");
>>         String vlan = Script.runSimpleBashScript("ls /proc/net/vlan/" +
>>pif);
>>
>>-        if (vlan != null && !vlan.isEmpty()) {
>>+        if (vlan != null && !vlan.isEmpty() &&
>>(!pif.startsWith("vlan") || pif.matches("vlan\\d+\\.\\d+"))) {
>>                 pif = Script.runSimpleBashScript("grep ^Device\\:
>>/proc/net/vlan/" + pif + " | awk {'print $2'}");
>>         }
>>
>>On Thu, Oct 18, 2012 at 8:05 AM, Chip Childers
>><ch...@sungard.com> wrote:
>>> On Thu, Oct 18, 2012 at 12:42 AM, Marcus Sorensen <sh...@gmail.com>
>>>wrote:
>>>> Sorry, I've been up to my ears. I've attached the simple patch that
>>>> makes this all happen, if anyone wants to take a look. This is the
>>>> code that looks for physical devices. It's passed a bridge and then
>>>> determines the parent of that bridge, then whether that parent is a
>>>> tagged device and goes one more step and finds its parent. This just
>>>> circumvents the last lookup if the parent of the bridge is a "vlan"
>>>> device (single tagged, e.g. vlan100) but not a double-tagged one (e.g.
>>>> vlan100.10), and the rest of cloudstack treats vlan100 as though it
>>>> were a physical device, creates tagged bridges on it if it has guest
>>>> traffic type, etc. I've been using it in our test bed for about a
>>>> month, and have only run into the MTU issue.
>>>
>>> Hey Marcus,
>>>
>>> Attachments get stripped.  Can you post it somewhere?
>>>
>>>> If people still think it's a good idea, I'll create a functional spec
>>>> and additional info on how it works.
>>>>
>>>>  I've also got a small patch to modifyvlans.sh, but I'm debating
>>>> whether or not it's necessary. It detects whether the "physical
>>>> interface" is actually a vlan tagged interface, and if so it subtracts
>>>> the necessary bytes from the MTU when it sets up the double-tagged
>>>> bridges. It's technically not necessary, as the important part is
>>>> whether the guest MTUs fit inside the MTU that the switch allows once
>>>> the extra tag is added. But it just makes it a bit more obvious as to
>>>> what's needed. However it also breaks the admin's ability to bump the
>>>> switch MTUs up just a bit, say 1532, to account for the excess without
>>>> having to go up to 9000 or full jumbo. If anyone is a network guru and
>>>> has any feedback it would be appreciated, but I'm inclined to leave
>>>> the MTUs alone and write it into the functional spec that a switch
>>>> with a 1500 MTU supports double tags up to 1468, and a switch with a
>>>> 9000 MTU supports VM guest networks up to 8968 MTU.
>>>>
>>>> On Mon, Oct 15, 2012 at 1:43 PM, Marcus Sorensen <sh...@gmail.com>
>>>>wrote:
>>>>> Ok, I'll pull out the changes and let people see them. Cloudstack
>>>>> seems to let me put the same vlan ranges on multiple physicals, though
>>>>> I haven't done much actual testing with large numbers of vlans. I
>>>>> imagine there would be other bottlenecks if they all needed to be up
>>>>> on the same host at once. Luckily we only create bridges for the
>>>>> actual VMs on the box so it should scale reasonably.
>>>>>
>>>>> The only caveat I've run into so far is that you either need to be
>>>>> running jumbo frames on your switches, or turn down the MTU on the
>>>>> guests a bit to accommodate the space taken by extra tag.  If you
>>>>> wanted to run jumbo fames on the guests as well, you'd run into the
>>>>> same situation and have to use slightly less than the 9000 (although
>>>>> the virtual router would require a patch too for the new size).
>>>>>
>>>>> On Mon, Oct 15, 2012 at 9:56 AM, Ahmad Emneina
>>>>><Ah...@citrix.com> wrote:
>>>>>> On 10/15/12 8:35 AM, "Kelceydamage@bbits" <ke...@bbits.ca> wrote:
>>>>>>
>>>>>>>That's a far more elegant way then I tried, which was creating tagged
>>>>>>>interfaces within guests.
>>>>>>>
>>>>>>>Sent from my iPhone
>>>>>>>
>>>>>>>On Oct 15, 2012, at 12:54 AM, Chiradeep Vittal
>>>>>>><Ch...@citrix.com> wrote:
>>>>>>>
>>>>>>>> This sounds like it can be modeled as multiple physical networks?
>>>>>>>>That
>>>>>>>>is,
>>>>>>>> each "outer" vlan (400, 401, etc) is a separate physical network
>>>>>>>>in the
>>>>>>>> same zone. That could work, although it is probable that the zone
>>>>>>>> configuration API bits prevent more than 4k VLANs per zone (that
>>>>>>>>can be
>>>>>>>> changed to per physical network).
>>>>>>>>
>>>>>>>> As long as communication between guests on different physical
>>>>>>>>networks
>>>>>>>> happens via the public network, it should be Ok.
>>>>>>>> I'd like to see the patch.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>> On 10/12/12 1:09 AM, "Marcus Sorensen" <sh...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Guys, in looking for a free and scalable way to provide private
>>>>>>>>>networks
>>>>>>>>> for customers I've been running a QinQ setup that has been working
>>>>>>>>>quite
>>>>>>>>> well. I've sort of laid the groundwork for it already in changing
>>>>>>>>>the
>>>>>>>>> bridge naming conventions about a month ago for KVM(to names that
>>>>>>>>>won't
>>>>>>>>> collide if the same vlans is used twice on different phys).
>>>>>>>>>
>>>>>>>>> Basically the way it works is like this. Linux has two ways of
>>>>>>>>>creating
>>>>>>>>> tagged networks, the eth#.# and the less used vlan# network
>>>>>>>>>devices. I
>>>>>>>>> have
>>>>>>>>> a tiny patch that causes cloudstack to treat vlan# devs as though
>>>>>>>>>they
>>>>>>>>> were
>>>>>>>>> physical NICs. In this way, you can do something like physical
>>>>>>>>>devices
>>>>>>>>> eth0,eth1,and vlan400. management traffic on eth0's bridge,
>>>>>>>>>storage on
>>>>>>>>> eth1.102's bridge, maybe eth1.103 for public/guest, then create
>>>>>>>>>say a
>>>>>>>>> vlan400 that is tag 400 on eth1. You add a traffic type of guest
>>>>>>>>>to it
>>>>>>>>>and
>>>>>>>>> give it a vlan range, say 10-4000. Then you end up with cloudstack
>>>>>>>>>handing
>>>>>>>>> out vlan400.10, vlan400.11, etc for guest networks. Works great
>>>>>>>>>for
>>>>>>>>> network
>>>>>>>>> isolation without burning through a bunch of your "real" vlans.
>>>>>>>>>In the
>>>>>>>>> unlikely event that you run out, you just create a physical
>>>>>>>>>vlan401 and
>>>>>>>>> start over with the vlan numbers.
>>>>>>>>>
>>>>>>>>> In theory all-you-can-eat isolated networks without having to
>>>>>>>>>configure
>>>>>>>>> hundreds of vlans on your networking equipment. This may require
>>>>>>>>> additional
>>>>>>>>> config on any upstream switches to pass the double tags around,
>>>>>>>>>but in
>>>>>>>>> general from what I've seen the inner tags just pass through on
>>>>>>>>>anything
>>>>>>>>> layer 2, it should only get tricky if you try to tunnel, route or
>>>>>>>>>strip
>>>>>>>>> tags.
>>>>>>>>>
>>>>>>>>> This is especially nice with system VM routers and VPC (cloudstack
>>>>>>>>>takes
>>>>>>>>> care of everything), but admittedly external routers probably
>>>>>>>>>will have
>>>>>>>>> spotty support for being able to route double tagged stuff. I'm
>>>>>>>>>also a
>>>>>>>>>bit
>>>>>>>>> afraid that if I were to get it merged in that it would just
>>>>>>>>>become
>>>>>>>>>this
>>>>>>>>> undocumented hack thing that few know about and nobody uses. So
>>>>>>>>>I'm
>>>>>>>>> looking
>>>>>>>>> for feedback on whether this sounds useful enough to commit, how
>>>>>>>>>it
>>>>>>>>>should
>>>>>>>>> be documented, and whether it makes sense to hint at this in the
>>>>>>>>>GUI
>>>>>>>>> somehow.
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> +1
>>>>>>
>>>>>> This actually sounds amazing Marcus. I'd love to see and use this
>>>>>> implementation.
>>>>>>
>>>>>> --
>>>>>> Æ
>>>>>>
>>>>>>
>>>>>>
>

Re: [RFC] QinQ vlans support

Posted by Chiradeep Vittal <Ch...@citrix.com>.
It looks like your patch does not require the admin to configure anything
wrt
physical networks. The admin knows the list of "outer" VLANs and
CloudStack is 
blissfully unaware of the QinQ stuff.
This requires the hypervisors to be independently configured (out-of-band)
with the 
outer VLAN bridges ?
It also looks like this is a KVM-only solution.
Have you tried this with XS?

On 10/18/12 6:21 PM, "Marcus Sorensen" <sh...@gmail.com> wrote:

>Ah, well it's pretty simple, so I'll just paste it here. Again,
>perhaps more should be implemented regarding the MTU (like
>functionality to configure MTU on the virtual router), but if you know
>what to do it can all work via switch configs.
>
>diff --git 
>a/plugins/hypervisors/kvm/src/com/cloud/hypervisor/kvm/resource/LibvirtCom
>putingResource.java
>b/plugins/hypervisors/kvm/src/com/cloud/hypervisor/kvm/resource/LibvirtCom
>putingResource.java
>index 1bc70fa..70de3db 100755
>--- 
>a/plugins/hypervisors/kvm/src/com/cloud/hypervisor/kvm/resource/LibvirtCom
>putingResource.java
>+++ 
>b/plugins/hypervisors/kvm/src/com/cloud/hypervisor/kvm/resource/LibvirtCom
>putingResource.java
>@@ -800,7 +800,7 @@ public class LibvirtComputingResource extends
>ServerResourceBase implements
>         String pif = Script.runSimpleBashScript("brctl show | grep "
>+ bridge + " | awk '{print $4}'");
>         String vlan = Script.runSimpleBashScript("ls /proc/net/vlan/" +
>pif);
>
>-        if (vlan != null && !vlan.isEmpty()) {
>+        if (vlan != null && !vlan.isEmpty() &&
>(!pif.startsWith("vlan") || pif.matches("vlan\\d+\\.\\d+"))) {
>                 pif = Script.runSimpleBashScript("grep ^Device\\:
>/proc/net/vlan/" + pif + " | awk {'print $2'}");
>         }
>
>On Thu, Oct 18, 2012 at 8:05 AM, Chip Childers
><ch...@sungard.com> wrote:
>> On Thu, Oct 18, 2012 at 12:42 AM, Marcus Sorensen <sh...@gmail.com>
>>wrote:
>>> Sorry, I've been up to my ears. I've attached the simple patch that
>>> makes this all happen, if anyone wants to take a look. This is the
>>> code that looks for physical devices. It's passed a bridge and then
>>> determines the parent of that bridge, then whether that parent is a
>>> tagged device and goes one more step and finds its parent. This just
>>> circumvents the last lookup if the parent of the bridge is a "vlan"
>>> device (single tagged, e.g. vlan100) but not a double-tagged one (e.g.
>>> vlan100.10), and the rest of cloudstack treats vlan100 as though it
>>> were a physical device, creates tagged bridges on it if it has guest
>>> traffic type, etc. I've been using it in our test bed for about a
>>> month, and have only run into the MTU issue.
>>
>> Hey Marcus,
>>
>> Attachments get stripped.  Can you post it somewhere?
>>
>>> If people still think it's a good idea, I'll create a functional spec
>>> and additional info on how it works.
>>>
>>>  I've also got a small patch to modifyvlans.sh, but I'm debating
>>> whether or not it's necessary. It detects whether the "physical
>>> interface" is actually a vlan tagged interface, and if so it subtracts
>>> the necessary bytes from the MTU when it sets up the double-tagged
>>> bridges. It's technically not necessary, as the important part is
>>> whether the guest MTUs fit inside the MTU that the switch allows once
>>> the extra tag is added. But it just makes it a bit more obvious as to
>>> what's needed. However it also breaks the admin's ability to bump the
>>> switch MTUs up just a bit, say 1532, to account for the excess without
>>> having to go up to 9000 or full jumbo. If anyone is a network guru and
>>> has any feedback it would be appreciated, but I'm inclined to leave
>>> the MTUs alone and write it into the functional spec that a switch
>>> with a 1500 MTU supports double tags up to 1468, and a switch with a
>>> 9000 MTU supports VM guest networks up to 8968 MTU.
>>>
>>> On Mon, Oct 15, 2012 at 1:43 PM, Marcus Sorensen <sh...@gmail.com>
>>>wrote:
>>>> Ok, I'll pull out the changes and let people see them. Cloudstack
>>>> seems to let me put the same vlan ranges on multiple physicals, though
>>>> I haven't done much actual testing with large numbers of vlans. I
>>>> imagine there would be other bottlenecks if they all needed to be up
>>>> on the same host at once. Luckily we only create bridges for the
>>>> actual VMs on the box so it should scale reasonably.
>>>>
>>>> The only caveat I've run into so far is that you either need to be
>>>> running jumbo frames on your switches, or turn down the MTU on the
>>>> guests a bit to accommodate the space taken by extra tag.  If you
>>>> wanted to run jumbo fames on the guests as well, you'd run into the
>>>> same situation and have to use slightly less than the 9000 (although
>>>> the virtual router would require a patch too for the new size).
>>>>
>>>> On Mon, Oct 15, 2012 at 9:56 AM, Ahmad Emneina
>>>><Ah...@citrix.com> wrote:
>>>>> On 10/15/12 8:35 AM, "Kelceydamage@bbits" <ke...@bbits.ca> wrote:
>>>>>
>>>>>>That's a far more elegant way then I tried, which was creating tagged
>>>>>>interfaces within guests.
>>>>>>
>>>>>>Sent from my iPhone
>>>>>>
>>>>>>On Oct 15, 2012, at 12:54 AM, Chiradeep Vittal
>>>>>><Ch...@citrix.com> wrote:
>>>>>>
>>>>>>> This sounds like it can be modeled as multiple physical networks?
>>>>>>>That
>>>>>>>is,
>>>>>>> each "outer" vlan (400, 401, etc) is a separate physical network
>>>>>>>in the
>>>>>>> same zone. That could work, although it is probable that the zone
>>>>>>> configuration API bits prevent more than 4k VLANs per zone (that
>>>>>>>can be
>>>>>>> changed to per physical network).
>>>>>>>
>>>>>>> As long as communication between guests on different physical
>>>>>>>networks
>>>>>>> happens via the public network, it should be Ok.
>>>>>>> I'd like to see the patch.
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> On 10/12/12 1:09 AM, "Marcus Sorensen" <sh...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Guys, in looking for a free and scalable way to provide private
>>>>>>>>networks
>>>>>>>> for customers I've been running a QinQ setup that has been working
>>>>>>>>quite
>>>>>>>> well. I've sort of laid the groundwork for it already in changing
>>>>>>>>the
>>>>>>>> bridge naming conventions about a month ago for KVM(to names that
>>>>>>>>won't
>>>>>>>> collide if the same vlans is used twice on different phys).
>>>>>>>>
>>>>>>>> Basically the way it works is like this. Linux has two ways of
>>>>>>>>creating
>>>>>>>> tagged networks, the eth#.# and the less used vlan# network
>>>>>>>>devices. I
>>>>>>>> have
>>>>>>>> a tiny patch that causes cloudstack to treat vlan# devs as though
>>>>>>>>they
>>>>>>>> were
>>>>>>>> physical NICs. In this way, you can do something like physical
>>>>>>>>devices
>>>>>>>> eth0,eth1,and vlan400. management traffic on eth0's bridge,
>>>>>>>>storage on
>>>>>>>> eth1.102's bridge, maybe eth1.103 for public/guest, then create
>>>>>>>>say a
>>>>>>>> vlan400 that is tag 400 on eth1. You add a traffic type of guest
>>>>>>>>to it
>>>>>>>>and
>>>>>>>> give it a vlan range, say 10-4000. Then you end up with cloudstack
>>>>>>>>handing
>>>>>>>> out vlan400.10, vlan400.11, etc for guest networks. Works great
>>>>>>>>for
>>>>>>>> network
>>>>>>>> isolation without burning through a bunch of your "real" vlans.
>>>>>>>>In the
>>>>>>>> unlikely event that you run out, you just create a physical
>>>>>>>>vlan401 and
>>>>>>>> start over with the vlan numbers.
>>>>>>>>
>>>>>>>> In theory all-you-can-eat isolated networks without having to
>>>>>>>>configure
>>>>>>>> hundreds of vlans on your networking equipment. This may require
>>>>>>>> additional
>>>>>>>> config on any upstream switches to pass the double tags around,
>>>>>>>>but in
>>>>>>>> general from what I've seen the inner tags just pass through on
>>>>>>>>anything
>>>>>>>> layer 2, it should only get tricky if you try to tunnel, route or
>>>>>>>>strip
>>>>>>>> tags.
>>>>>>>>
>>>>>>>> This is especially nice with system VM routers and VPC (cloudstack
>>>>>>>>takes
>>>>>>>> care of everything), but admittedly external routers probably
>>>>>>>>will have
>>>>>>>> spotty support for being able to route double tagged stuff. I'm
>>>>>>>>also a
>>>>>>>>bit
>>>>>>>> afraid that if I were to get it merged in that it would just
>>>>>>>>become
>>>>>>>>this
>>>>>>>> undocumented hack thing that few know about and nobody uses. So
>>>>>>>>I'm
>>>>>>>> looking
>>>>>>>> for feedback on whether this sounds useful enough to commit, how
>>>>>>>>it
>>>>>>>>should
>>>>>>>> be documented, and whether it makes sense to hint at this in the
>>>>>>>>GUI
>>>>>>>> somehow.
>>>>>>>
>>>>>>
>>>>>
>>>>> +1
>>>>>
>>>>> This actually sounds amazing Marcus. I'd love to see and use this
>>>>> implementation.
>>>>>
>>>>> --
>>>>> Æ
>>>>>
>>>>>
>>>>>


Re: [RFC] QinQ vlans support

Posted by Marcus Sorensen <sh...@gmail.com>.
Ah, well it's pretty simple, so I'll just paste it here. Again,
perhaps more should be implemented regarding the MTU (like
functionality to configure MTU on the virtual router), but if you know
what to do it can all work via switch configs.

diff --git a/plugins/hypervisors/kvm/src/com/cloud/hypervisor/kvm/resource/LibvirtComputingResource.java
b/plugins/hypervisors/kvm/src/com/cloud/hypervisor/kvm/resource/LibvirtComputingResource.java
index 1bc70fa..70de3db 100755
--- a/plugins/hypervisors/kvm/src/com/cloud/hypervisor/kvm/resource/LibvirtComputingResource.java
+++ b/plugins/hypervisors/kvm/src/com/cloud/hypervisor/kvm/resource/LibvirtComputingResource.java
@@ -800,7 +800,7 @@ public class LibvirtComputingResource extends
ServerResourceBase implements
         String pif = Script.runSimpleBashScript("brctl show | grep "
+ bridge + " | awk '{print $4}'");
         String vlan = Script.runSimpleBashScript("ls /proc/net/vlan/" + pif);

-        if (vlan != null && !vlan.isEmpty()) {
+        if (vlan != null && !vlan.isEmpty() &&
(!pif.startsWith("vlan") || pif.matches("vlan\\d+\\.\\d+"))) {
                 pif = Script.runSimpleBashScript("grep ^Device\\:
/proc/net/vlan/" + pif + " | awk {'print $2'}");
         }

On Thu, Oct 18, 2012 at 8:05 AM, Chip Childers
<ch...@sungard.com> wrote:
> On Thu, Oct 18, 2012 at 12:42 AM, Marcus Sorensen <sh...@gmail.com> wrote:
>> Sorry, I've been up to my ears. I've attached the simple patch that
>> makes this all happen, if anyone wants to take a look. This is the
>> code that looks for physical devices. It's passed a bridge and then
>> determines the parent of that bridge, then whether that parent is a
>> tagged device and goes one more step and finds its parent. This just
>> circumvents the last lookup if the parent of the bridge is a "vlan"
>> device (single tagged, e.g. vlan100) but not a double-tagged one (e.g.
>> vlan100.10), and the rest of cloudstack treats vlan100 as though it
>> were a physical device, creates tagged bridges on it if it has guest
>> traffic type, etc. I've been using it in our test bed for about a
>> month, and have only run into the MTU issue.
>
> Hey Marcus,
>
> Attachments get stripped.  Can you post it somewhere?
>
>> If people still think it's a good idea, I'll create a functional spec
>> and additional info on how it works.
>>
>>  I've also got a small patch to modifyvlans.sh, but I'm debating
>> whether or not it's necessary. It detects whether the "physical
>> interface" is actually a vlan tagged interface, and if so it subtracts
>> the necessary bytes from the MTU when it sets up the double-tagged
>> bridges. It's technically not necessary, as the important part is
>> whether the guest MTUs fit inside the MTU that the switch allows once
>> the extra tag is added. But it just makes it a bit more obvious as to
>> what's needed. However it also breaks the admin's ability to bump the
>> switch MTUs up just a bit, say 1532, to account for the excess without
>> having to go up to 9000 or full jumbo. If anyone is a network guru and
>> has any feedback it would be appreciated, but I'm inclined to leave
>> the MTUs alone and write it into the functional spec that a switch
>> with a 1500 MTU supports double tags up to 1468, and a switch with a
>> 9000 MTU supports VM guest networks up to 8968 MTU.
>>
>> On Mon, Oct 15, 2012 at 1:43 PM, Marcus Sorensen <sh...@gmail.com> wrote:
>>> Ok, I'll pull out the changes and let people see them. Cloudstack
>>> seems to let me put the same vlan ranges on multiple physicals, though
>>> I haven't done much actual testing with large numbers of vlans. I
>>> imagine there would be other bottlenecks if they all needed to be up
>>> on the same host at once. Luckily we only create bridges for the
>>> actual VMs on the box so it should scale reasonably.
>>>
>>> The only caveat I've run into so far is that you either need to be
>>> running jumbo frames on your switches, or turn down the MTU on the
>>> guests a bit to accommodate the space taken by extra tag.  If you
>>> wanted to run jumbo fames on the guests as well, you'd run into the
>>> same situation and have to use slightly less than the 9000 (although
>>> the virtual router would require a patch too for the new size).
>>>
>>> On Mon, Oct 15, 2012 at 9:56 AM, Ahmad Emneina <Ah...@citrix.com> wrote:
>>>> On 10/15/12 8:35 AM, "Kelceydamage@bbits" <ke...@bbits.ca> wrote:
>>>>
>>>>>That's a far more elegant way then I tried, which was creating tagged
>>>>>interfaces within guests.
>>>>>
>>>>>Sent from my iPhone
>>>>>
>>>>>On Oct 15, 2012, at 12:54 AM, Chiradeep Vittal
>>>>><Ch...@citrix.com> wrote:
>>>>>
>>>>>> This sounds like it can be modeled as multiple physical networks? That
>>>>>>is,
>>>>>> each "outer" vlan (400, 401, etc) is a separate physical network in the
>>>>>> same zone. That could work, although it is probable that the zone
>>>>>> configuration API bits prevent more than 4k VLANs per zone (that can be
>>>>>> changed to per physical network).
>>>>>>
>>>>>> As long as communication between guests on different physical networks
>>>>>> happens via the public network, it should be Ok.
>>>>>> I'd like to see the patch.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> On 10/12/12 1:09 AM, "Marcus Sorensen" <sh...@gmail.com> wrote:
>>>>>>
>>>>>>> Guys, in looking for a free and scalable way to provide private
>>>>>>>networks
>>>>>>> for customers I've been running a QinQ setup that has been working
>>>>>>>quite
>>>>>>> well. I've sort of laid the groundwork for it already in changing the
>>>>>>> bridge naming conventions about a month ago for KVM(to names that won't
>>>>>>> collide if the same vlans is used twice on different phys).
>>>>>>>
>>>>>>> Basically the way it works is like this. Linux has two ways of creating
>>>>>>> tagged networks, the eth#.# and the less used vlan# network devices. I
>>>>>>> have
>>>>>>> a tiny patch that causes cloudstack to treat vlan# devs as though they
>>>>>>> were
>>>>>>> physical NICs. In this way, you can do something like physical devices
>>>>>>> eth0,eth1,and vlan400. management traffic on eth0's bridge, storage on
>>>>>>> eth1.102's bridge, maybe eth1.103 for public/guest, then create say a
>>>>>>> vlan400 that is tag 400 on eth1. You add a traffic type of guest to it
>>>>>>>and
>>>>>>> give it a vlan range, say 10-4000. Then you end up with cloudstack
>>>>>>>handing
>>>>>>> out vlan400.10, vlan400.11, etc for guest networks. Works great for
>>>>>>> network
>>>>>>> isolation without burning through a bunch of your "real" vlans. In the
>>>>>>> unlikely event that you run out, you just create a physical vlan401 and
>>>>>>> start over with the vlan numbers.
>>>>>>>
>>>>>>> In theory all-you-can-eat isolated networks without having to configure
>>>>>>> hundreds of vlans on your networking equipment. This may require
>>>>>>> additional
>>>>>>> config on any upstream switches to pass the double tags around, but in
>>>>>>> general from what I've seen the inner tags just pass through on
>>>>>>>anything
>>>>>>> layer 2, it should only get tricky if you try to tunnel, route or strip
>>>>>>> tags.
>>>>>>>
>>>>>>> This is especially nice with system VM routers and VPC (cloudstack
>>>>>>>takes
>>>>>>> care of everything), but admittedly external routers probably will have
>>>>>>> spotty support for being able to route double tagged stuff. I'm also a
>>>>>>>bit
>>>>>>> afraid that if I were to get it merged in that it would just become
>>>>>>>this
>>>>>>> undocumented hack thing that few know about and nobody uses. So I'm
>>>>>>> looking
>>>>>>> for feedback on whether this sounds useful enough to commit, how it
>>>>>>>should
>>>>>>> be documented, and whether it makes sense to hint at this in the GUI
>>>>>>> somehow.
>>>>>>
>>>>>
>>>>
>>>> +1
>>>>
>>>> This actually sounds amazing Marcus. I'd love to see and use this
>>>> implementation.
>>>>
>>>> --
>>>> Æ
>>>>
>>>>
>>>>

Re: [RFC] QinQ vlans support

Posted by Chip Childers <ch...@sungard.com>.
On Thu, Oct 18, 2012 at 12:42 AM, Marcus Sorensen <sh...@gmail.com> wrote:
> Sorry, I've been up to my ears. I've attached the simple patch that
> makes this all happen, if anyone wants to take a look. This is the
> code that looks for physical devices. It's passed a bridge and then
> determines the parent of that bridge, then whether that parent is a
> tagged device and goes one more step and finds its parent. This just
> circumvents the last lookup if the parent of the bridge is a "vlan"
> device (single tagged, e.g. vlan100) but not a double-tagged one (e.g.
> vlan100.10), and the rest of cloudstack treats vlan100 as though it
> were a physical device, creates tagged bridges on it if it has guest
> traffic type, etc. I've been using it in our test bed for about a
> month, and have only run into the MTU issue.

Hey Marcus,

Attachments get stripped.  Can you post it somewhere?

> If people still think it's a good idea, I'll create a functional spec
> and additional info on how it works.
>
>  I've also got a small patch to modifyvlans.sh, but I'm debating
> whether or not it's necessary. It detects whether the "physical
> interface" is actually a vlan tagged interface, and if so it subtracts
> the necessary bytes from the MTU when it sets up the double-tagged
> bridges. It's technically not necessary, as the important part is
> whether the guest MTUs fit inside the MTU that the switch allows once
> the extra tag is added. But it just makes it a bit more obvious as to
> what's needed. However it also breaks the admin's ability to bump the
> switch MTUs up just a bit, say 1532, to account for the excess without
> having to go up to 9000 or full jumbo. If anyone is a network guru and
> has any feedback it would be appreciated, but I'm inclined to leave
> the MTUs alone and write it into the functional spec that a switch
> with a 1500 MTU supports double tags up to 1468, and a switch with a
> 9000 MTU supports VM guest networks up to 8968 MTU.
>
> On Mon, Oct 15, 2012 at 1:43 PM, Marcus Sorensen <sh...@gmail.com> wrote:
>> Ok, I'll pull out the changes and let people see them. Cloudstack
>> seems to let me put the same vlan ranges on multiple physicals, though
>> I haven't done much actual testing with large numbers of vlans. I
>> imagine there would be other bottlenecks if they all needed to be up
>> on the same host at once. Luckily we only create bridges for the
>> actual VMs on the box so it should scale reasonably.
>>
>> The only caveat I've run into so far is that you either need to be
>> running jumbo frames on your switches, or turn down the MTU on the
>> guests a bit to accommodate the space taken by extra tag.  If you
>> wanted to run jumbo fames on the guests as well, you'd run into the
>> same situation and have to use slightly less than the 9000 (although
>> the virtual router would require a patch too for the new size).
>>
>> On Mon, Oct 15, 2012 at 9:56 AM, Ahmad Emneina <Ah...@citrix.com> wrote:
>>> On 10/15/12 8:35 AM, "Kelceydamage@bbits" <ke...@bbits.ca> wrote:
>>>
>>>>That's a far more elegant way then I tried, which was creating tagged
>>>>interfaces within guests.
>>>>
>>>>Sent from my iPhone
>>>>
>>>>On Oct 15, 2012, at 12:54 AM, Chiradeep Vittal
>>>><Ch...@citrix.com> wrote:
>>>>
>>>>> This sounds like it can be modeled as multiple physical networks? That
>>>>>is,
>>>>> each "outer" vlan (400, 401, etc) is a separate physical network in the
>>>>> same zone. That could work, although it is probable that the zone
>>>>> configuration API bits prevent more than 4k VLANs per zone (that can be
>>>>> changed to per physical network).
>>>>>
>>>>> As long as communication between guests on different physical networks
>>>>> happens via the public network, it should be Ok.
>>>>> I'd like to see the patch.
>>>>>
>>>>> Thanks
>>>>>
>>>>> On 10/12/12 1:09 AM, "Marcus Sorensen" <sh...@gmail.com> wrote:
>>>>>
>>>>>> Guys, in looking for a free and scalable way to provide private
>>>>>>networks
>>>>>> for customers I've been running a QinQ setup that has been working
>>>>>>quite
>>>>>> well. I've sort of laid the groundwork for it already in changing the
>>>>>> bridge naming conventions about a month ago for KVM(to names that won't
>>>>>> collide if the same vlans is used twice on different phys).
>>>>>>
>>>>>> Basically the way it works is like this. Linux has two ways of creating
>>>>>> tagged networks, the eth#.# and the less used vlan# network devices. I
>>>>>> have
>>>>>> a tiny patch that causes cloudstack to treat vlan# devs as though they
>>>>>> were
>>>>>> physical NICs. In this way, you can do something like physical devices
>>>>>> eth0,eth1,and vlan400. management traffic on eth0's bridge, storage on
>>>>>> eth1.102's bridge, maybe eth1.103 for public/guest, then create say a
>>>>>> vlan400 that is tag 400 on eth1. You add a traffic type of guest to it
>>>>>>and
>>>>>> give it a vlan range, say 10-4000. Then you end up with cloudstack
>>>>>>handing
>>>>>> out vlan400.10, vlan400.11, etc for guest networks. Works great for
>>>>>> network
>>>>>> isolation without burning through a bunch of your "real" vlans. In the
>>>>>> unlikely event that you run out, you just create a physical vlan401 and
>>>>>> start over with the vlan numbers.
>>>>>>
>>>>>> In theory all-you-can-eat isolated networks without having to configure
>>>>>> hundreds of vlans on your networking equipment. This may require
>>>>>> additional
>>>>>> config on any upstream switches to pass the double tags around, but in
>>>>>> general from what I've seen the inner tags just pass through on
>>>>>>anything
>>>>>> layer 2, it should only get tricky if you try to tunnel, route or strip
>>>>>> tags.
>>>>>>
>>>>>> This is especially nice with system VM routers and VPC (cloudstack
>>>>>>takes
>>>>>> care of everything), but admittedly external routers probably will have
>>>>>> spotty support for being able to route double tagged stuff. I'm also a
>>>>>>bit
>>>>>> afraid that if I were to get it merged in that it would just become
>>>>>>this
>>>>>> undocumented hack thing that few know about and nobody uses. So I'm
>>>>>> looking
>>>>>> for feedback on whether this sounds useful enough to commit, how it
>>>>>>should
>>>>>> be documented, and whether it makes sense to hint at this in the GUI
>>>>>> somehow.
>>>>>
>>>>
>>>
>>> +1
>>>
>>> This actually sounds amazing Marcus. I'd love to see and use this
>>> implementation.
>>>
>>> --
>>> Æ
>>>
>>>
>>>

Re: [RFC] QinQ vlans support

Posted by Marcus Sorensen <sh...@gmail.com>.
Sorry, I've been up to my ears. I've attached the simple patch that
makes this all happen, if anyone wants to take a look. This is the
code that looks for physical devices. It's passed a bridge and then
determines the parent of that bridge, then whether that parent is a
tagged device and goes one more step and finds its parent. This just
circumvents the last lookup if the parent of the bridge is a "vlan"
device (single tagged, e.g. vlan100) but not a double-tagged one (e.g.
vlan100.10), and the rest of cloudstack treats vlan100 as though it
were a physical device, creates tagged bridges on it if it has guest
traffic type, etc. I've been using it in our test bed for about a
month, and have only run into the MTU issue.

If people still think it's a good idea, I'll create a functional spec
and additional info on how it works.

 I've also got a small patch to modifyvlans.sh, but I'm debating
whether or not it's necessary. It detects whether the "physical
interface" is actually a vlan tagged interface, and if so it subtracts
the necessary bytes from the MTU when it sets up the double-tagged
bridges. It's technically not necessary, as the important part is
whether the guest MTUs fit inside the MTU that the switch allows once
the extra tag is added. But it just makes it a bit more obvious as to
what's needed. However it also breaks the admin's ability to bump the
switch MTUs up just a bit, say 1532, to account for the excess without
having to go up to 9000 or full jumbo. If anyone is a network guru and
has any feedback it would be appreciated, but I'm inclined to leave
the MTUs alone and write it into the functional spec that a switch
with a 1500 MTU supports double tags up to 1468, and a switch with a
9000 MTU supports VM guest networks up to 8968 MTU.

On Mon, Oct 15, 2012 at 1:43 PM, Marcus Sorensen <sh...@gmail.com> wrote:
> Ok, I'll pull out the changes and let people see them. Cloudstack
> seems to let me put the same vlan ranges on multiple physicals, though
> I haven't done much actual testing with large numbers of vlans. I
> imagine there would be other bottlenecks if they all needed to be up
> on the same host at once. Luckily we only create bridges for the
> actual VMs on the box so it should scale reasonably.
>
> The only caveat I've run into so far is that you either need to be
> running jumbo frames on your switches, or turn down the MTU on the
> guests a bit to accommodate the space taken by extra tag.  If you
> wanted to run jumbo fames on the guests as well, you'd run into the
> same situation and have to use slightly less than the 9000 (although
> the virtual router would require a patch too for the new size).
>
> On Mon, Oct 15, 2012 at 9:56 AM, Ahmad Emneina <Ah...@citrix.com> wrote:
>> On 10/15/12 8:35 AM, "Kelceydamage@bbits" <ke...@bbits.ca> wrote:
>>
>>>That's a far more elegant way then I tried, which was creating tagged
>>>interfaces within guests.
>>>
>>>Sent from my iPhone
>>>
>>>On Oct 15, 2012, at 12:54 AM, Chiradeep Vittal
>>><Ch...@citrix.com> wrote:
>>>
>>>> This sounds like it can be modeled as multiple physical networks? That
>>>>is,
>>>> each "outer" vlan (400, 401, etc) is a separate physical network in the
>>>> same zone. That could work, although it is probable that the zone
>>>> configuration API bits prevent more than 4k VLANs per zone (that can be
>>>> changed to per physical network).
>>>>
>>>> As long as communication between guests on different physical networks
>>>> happens via the public network, it should be Ok.
>>>> I'd like to see the patch.
>>>>
>>>> Thanks
>>>>
>>>> On 10/12/12 1:09 AM, "Marcus Sorensen" <sh...@gmail.com> wrote:
>>>>
>>>>> Guys, in looking for a free and scalable way to provide private
>>>>>networks
>>>>> for customers I've been running a QinQ setup that has been working
>>>>>quite
>>>>> well. I've sort of laid the groundwork for it already in changing the
>>>>> bridge naming conventions about a month ago for KVM(to names that won't
>>>>> collide if the same vlans is used twice on different phys).
>>>>>
>>>>> Basically the way it works is like this. Linux has two ways of creating
>>>>> tagged networks, the eth#.# and the less used vlan# network devices. I
>>>>> have
>>>>> a tiny patch that causes cloudstack to treat vlan# devs as though they
>>>>> were
>>>>> physical NICs. In this way, you can do something like physical devices
>>>>> eth0,eth1,and vlan400. management traffic on eth0's bridge, storage on
>>>>> eth1.102's bridge, maybe eth1.103 for public/guest, then create say a
>>>>> vlan400 that is tag 400 on eth1. You add a traffic type of guest to it
>>>>>and
>>>>> give it a vlan range, say 10-4000. Then you end up with cloudstack
>>>>>handing
>>>>> out vlan400.10, vlan400.11, etc for guest networks. Works great for
>>>>> network
>>>>> isolation without burning through a bunch of your "real" vlans. In the
>>>>> unlikely event that you run out, you just create a physical vlan401 and
>>>>> start over with the vlan numbers.
>>>>>
>>>>> In theory all-you-can-eat isolated networks without having to configure
>>>>> hundreds of vlans on your networking equipment. This may require
>>>>> additional
>>>>> config on any upstream switches to pass the double tags around, but in
>>>>> general from what I've seen the inner tags just pass through on
>>>>>anything
>>>>> layer 2, it should only get tricky if you try to tunnel, route or strip
>>>>> tags.
>>>>>
>>>>> This is especially nice with system VM routers and VPC (cloudstack
>>>>>takes
>>>>> care of everything), but admittedly external routers probably will have
>>>>> spotty support for being able to route double tagged stuff. I'm also a
>>>>>bit
>>>>> afraid that if I were to get it merged in that it would just become
>>>>>this
>>>>> undocumented hack thing that few know about and nobody uses. So I'm
>>>>> looking
>>>>> for feedback on whether this sounds useful enough to commit, how it
>>>>>should
>>>>> be documented, and whether it makes sense to hint at this in the GUI
>>>>> somehow.
>>>>
>>>
>>
>> +1
>>
>> This actually sounds amazing Marcus. I'd love to see and use this
>> implementation.
>>
>> --
>> Æ
>>
>>
>>

Re: [RFC] QinQ vlans support

Posted by Marcus Sorensen <sh...@gmail.com>.
Ok, I'll pull out the changes and let people see them. Cloudstack
seems to let me put the same vlan ranges on multiple physicals, though
I haven't done much actual testing with large numbers of vlans. I
imagine there would be other bottlenecks if they all needed to be up
on the same host at once. Luckily we only create bridges for the
actual VMs on the box so it should scale reasonably.

The only caveat I've run into so far is that you either need to be
running jumbo frames on your switches, or turn down the MTU on the
guests a bit to accommodate the space taken by extra tag.  If you
wanted to run jumbo fames on the guests as well, you'd run into the
same situation and have to use slightly less than the 9000 (although
the virtual router would require a patch too for the new size).

On Mon, Oct 15, 2012 at 9:56 AM, Ahmad Emneina <Ah...@citrix.com> wrote:
> On 10/15/12 8:35 AM, "Kelceydamage@bbits" <ke...@bbits.ca> wrote:
>
>>That's a far more elegant way then I tried, which was creating tagged
>>interfaces within guests.
>>
>>Sent from my iPhone
>>
>>On Oct 15, 2012, at 12:54 AM, Chiradeep Vittal
>><Ch...@citrix.com> wrote:
>>
>>> This sounds like it can be modeled as multiple physical networks? That
>>>is,
>>> each "outer" vlan (400, 401, etc) is a separate physical network in the
>>> same zone. That could work, although it is probable that the zone
>>> configuration API bits prevent more than 4k VLANs per zone (that can be
>>> changed to per physical network).
>>>
>>> As long as communication between guests on different physical networks
>>> happens via the public network, it should be Ok.
>>> I'd like to see the patch.
>>>
>>> Thanks
>>>
>>> On 10/12/12 1:09 AM, "Marcus Sorensen" <sh...@gmail.com> wrote:
>>>
>>>> Guys, in looking for a free and scalable way to provide private
>>>>networks
>>>> for customers I've been running a QinQ setup that has been working
>>>>quite
>>>> well. I've sort of laid the groundwork for it already in changing the
>>>> bridge naming conventions about a month ago for KVM(to names that won't
>>>> collide if the same vlans is used twice on different phys).
>>>>
>>>> Basically the way it works is like this. Linux has two ways of creating
>>>> tagged networks, the eth#.# and the less used vlan# network devices. I
>>>> have
>>>> a tiny patch that causes cloudstack to treat vlan# devs as though they
>>>> were
>>>> physical NICs. In this way, you can do something like physical devices
>>>> eth0,eth1,and vlan400. management traffic on eth0's bridge, storage on
>>>> eth1.102's bridge, maybe eth1.103 for public/guest, then create say a
>>>> vlan400 that is tag 400 on eth1. You add a traffic type of guest to it
>>>>and
>>>> give it a vlan range, say 10-4000. Then you end up with cloudstack
>>>>handing
>>>> out vlan400.10, vlan400.11, etc for guest networks. Works great for
>>>> network
>>>> isolation without burning through a bunch of your "real" vlans. In the
>>>> unlikely event that you run out, you just create a physical vlan401 and
>>>> start over with the vlan numbers.
>>>>
>>>> In theory all-you-can-eat isolated networks without having to configure
>>>> hundreds of vlans on your networking equipment. This may require
>>>> additional
>>>> config on any upstream switches to pass the double tags around, but in
>>>> general from what I've seen the inner tags just pass through on
>>>>anything
>>>> layer 2, it should only get tricky if you try to tunnel, route or strip
>>>> tags.
>>>>
>>>> This is especially nice with system VM routers and VPC (cloudstack
>>>>takes
>>>> care of everything), but admittedly external routers probably will have
>>>> spotty support for being able to route double tagged stuff. I'm also a
>>>>bit
>>>> afraid that if I were to get it merged in that it would just become
>>>>this
>>>> undocumented hack thing that few know about and nobody uses. So I'm
>>>> looking
>>>> for feedback on whether this sounds useful enough to commit, how it
>>>>should
>>>> be documented, and whether it makes sense to hint at this in the GUI
>>>> somehow.
>>>
>>
>
> +1
>
> This actually sounds amazing Marcus. I'd love to see and use this
> implementation.
>
> --
> Æ
>
>
>

Re: [RFC] QinQ vlans support

Posted by Ahmad Emneina <Ah...@citrix.com>.
On 10/15/12 8:35 AM, "Kelceydamage@bbits" <ke...@bbits.ca> wrote:

>That's a far more elegant way then I tried, which was creating tagged
>interfaces within guests.
>
>Sent from my iPhone
>
>On Oct 15, 2012, at 12:54 AM, Chiradeep Vittal
><Ch...@citrix.com> wrote:
>
>> This sounds like it can be modeled as multiple physical networks? That
>>is,
>> each "outer" vlan (400, 401, etc) is a separate physical network in the
>> same zone. That could work, although it is probable that the zone
>> configuration API bits prevent more than 4k VLANs per zone (that can be
>> changed to per physical network).
>> 
>> As long as communication between guests on different physical networks
>> happens via the public network, it should be Ok.
>> I'd like to see the patch.
>> 
>> Thanks
>> 
>> On 10/12/12 1:09 AM, "Marcus Sorensen" <sh...@gmail.com> wrote:
>> 
>>> Guys, in looking for a free and scalable way to provide private
>>>networks
>>> for customers I've been running a QinQ setup that has been working
>>>quite
>>> well. I've sort of laid the groundwork for it already in changing the
>>> bridge naming conventions about a month ago for KVM(to names that won't
>>> collide if the same vlans is used twice on different phys).
>>> 
>>> Basically the way it works is like this. Linux has two ways of creating
>>> tagged networks, the eth#.# and the less used vlan# network devices. I
>>> have
>>> a tiny patch that causes cloudstack to treat vlan# devs as though they
>>> were
>>> physical NICs. In this way, you can do something like physical devices
>>> eth0,eth1,and vlan400. management traffic on eth0's bridge, storage on
>>> eth1.102's bridge, maybe eth1.103 for public/guest, then create say a
>>> vlan400 that is tag 400 on eth1. You add a traffic type of guest to it
>>>and
>>> give it a vlan range, say 10-4000. Then you end up with cloudstack
>>>handing
>>> out vlan400.10, vlan400.11, etc for guest networks. Works great for
>>> network
>>> isolation without burning through a bunch of your "real" vlans. In the
>>> unlikely event that you run out, you just create a physical vlan401 and
>>> start over with the vlan numbers.
>>> 
>>> In theory all-you-can-eat isolated networks without having to configure
>>> hundreds of vlans on your networking equipment. This may require
>>> additional
>>> config on any upstream switches to pass the double tags around, but in
>>> general from what I've seen the inner tags just pass through on
>>>anything
>>> layer 2, it should only get tricky if you try to tunnel, route or strip
>>> tags.
>>> 
>>> This is especially nice with system VM routers and VPC (cloudstack
>>>takes
>>> care of everything), but admittedly external routers probably will have
>>> spotty support for being able to route double tagged stuff. I'm also a
>>>bit
>>> afraid that if I were to get it merged in that it would just become
>>>this
>>> undocumented hack thing that few know about and nobody uses. So I'm
>>> looking
>>> for feedback on whether this sounds useful enough to commit, how it
>>>should
>>> be documented, and whether it makes sense to hint at this in the GUI
>>> somehow.
>> 
>

+1

This actually sounds amazing Marcus. I'd love to see and use this
implementation.

-- 
Æ




Re: [RFC] QinQ vlans support

Posted by "Kelceydamage@bbits" <ke...@bbits.ca>.
That's a far more elegant way then I tried, which was creating tagged interfaces within guests.

Sent from my iPhone

On Oct 15, 2012, at 12:54 AM, Chiradeep Vittal <Ch...@citrix.com> wrote:

> This sounds like it can be modeled as multiple physical networks? That is,
> each "outer" vlan (400, 401, etc) is a separate physical network in the
> same zone. That could work, although it is probable that the zone
> configuration API bits prevent more than 4k VLANs per zone (that can be
> changed to per physical network).
> 
> As long as communication between guests on different physical networks
> happens via the public network, it should be Ok.
> I'd like to see the patch.
> 
> Thanks
> 
> On 10/12/12 1:09 AM, "Marcus Sorensen" <sh...@gmail.com> wrote:
> 
>> Guys, in looking for a free and scalable way to provide private networks
>> for customers I've been running a QinQ setup that has been working quite
>> well. I've sort of laid the groundwork for it already in changing the
>> bridge naming conventions about a month ago for KVM(to names that won't
>> collide if the same vlans is used twice on different phys).
>> 
>> Basically the way it works is like this. Linux has two ways of creating
>> tagged networks, the eth#.# and the less used vlan# network devices. I
>> have
>> a tiny patch that causes cloudstack to treat vlan# devs as though they
>> were
>> physical NICs. In this way, you can do something like physical devices
>> eth0,eth1,and vlan400. management traffic on eth0's bridge, storage on
>> eth1.102's bridge, maybe eth1.103 for public/guest, then create say a
>> vlan400 that is tag 400 on eth1. You add a traffic type of guest to it and
>> give it a vlan range, say 10-4000. Then you end up with cloudstack handing
>> out vlan400.10, vlan400.11, etc for guest networks. Works great for
>> network
>> isolation without burning through a bunch of your "real" vlans. In the
>> unlikely event that you run out, you just create a physical vlan401 and
>> start over with the vlan numbers.
>> 
>> In theory all-you-can-eat isolated networks without having to configure
>> hundreds of vlans on your networking equipment. This may require
>> additional
>> config on any upstream switches to pass the double tags around, but in
>> general from what I've seen the inner tags just pass through on anything
>> layer 2, it should only get tricky if you try to tunnel, route or strip
>> tags.
>> 
>> This is especially nice with system VM routers and VPC (cloudstack takes
>> care of everything), but admittedly external routers probably will have
>> spotty support for being able to route double tagged stuff. I'm also a bit
>> afraid that if I were to get it merged in that it would just become this
>> undocumented hack thing that few know about and nobody uses. So I'm
>> looking
>> for feedback on whether this sounds useful enough to commit, how it should
>> be documented, and whether it makes sense to hint at this in the GUI
>> somehow.
> 

Re: [RFC] QinQ vlans support

Posted by Chiradeep Vittal <Ch...@citrix.com>.
This sounds like it can be modeled as multiple physical networks? That is,
each "outer" vlan (400, 401, etc) is a separate physical network in the
same zone. That could work, although it is probable that the zone
configuration API bits prevent more than 4k VLANs per zone (that can be
changed to per physical network).

As long as communication between guests on different physical networks
happens via the public network, it should be Ok.
I'd like to see the patch.

Thanks

On 10/12/12 1:09 AM, "Marcus Sorensen" <sh...@gmail.com> wrote:

>Guys, in looking for a free and scalable way to provide private networks
>for customers I've been running a QinQ setup that has been working quite
>well. I've sort of laid the groundwork for it already in changing the
>bridge naming conventions about a month ago for KVM(to names that won't
>collide if the same vlans is used twice on different phys).
>
>Basically the way it works is like this. Linux has two ways of creating
>tagged networks, the eth#.# and the less used vlan# network devices. I
>have
>a tiny patch that causes cloudstack to treat vlan# devs as though they
>were
>physical NICs. In this way, you can do something like physical devices
>eth0,eth1,and vlan400. management traffic on eth0's bridge, storage on
>eth1.102's bridge, maybe eth1.103 for public/guest, then create say a
>vlan400 that is tag 400 on eth1. You add a traffic type of guest to it and
>give it a vlan range, say 10-4000. Then you end up with cloudstack handing
>out vlan400.10, vlan400.11, etc for guest networks. Works great for
>network
>isolation without burning through a bunch of your "real" vlans. In the
>unlikely event that you run out, you just create a physical vlan401 and
>start over with the vlan numbers.
>
>In theory all-you-can-eat isolated networks without having to configure
>hundreds of vlans on your networking equipment. This may require
>additional
>config on any upstream switches to pass the double tags around, but in
>general from what I've seen the inner tags just pass through on anything
>layer 2, it should only get tricky if you try to tunnel, route or strip
>tags.
>
>This is especially nice with system VM routers and VPC (cloudstack takes
>care of everything), but admittedly external routers probably will have
>spotty support for being able to route double tagged stuff. I'm also a bit
>afraid that if I were to get it merged in that it would just become this
>undocumented hack thing that few know about and nobody uses. So I'm
>looking
>for feedback on whether this sounds useful enough to commit, how it should
>be documented, and whether it makes sense to hint at this in the GUI
>somehow.