You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@cloudstack.apache.org by Andrija Panic <an...@gmail.com> on 2017/10/09 20:52:34 UTC

Help/Advice needed - some traffic don't reach VNET / VM

Hi guys,

we have occasional but serious problem, that starts happening as it seems
randomly (i.e. NOT under high load)  - not ACS related afaik, purely KVM,
but feedback is really welcomed.

- VM is reachable in general from everywhere, but not reachable from
specific IP address ?!
- VM is NOT under high load, network traffic next to zero, same for
CPU/disk...
- We mitigate this problem by migrating VM away to another host, not much
of a solution...

Description of problem:

We let ping from "problematic" source IP address to the problematic VM, and
we capture traffic on KVM host where the problematic VM lives:

- Tcpdump on VXLAN interface (physical incoming interface on the host) - we
see packet fine
- tcpdump on BRIDGE = we see packet fine
- tcpdump on VNET = we DON'T see packet.

In the scenario above, I need to say that :
- we can tcpdump packets from other source IPs on the VNET interface just
fine (as expected), so should also see this problematic source IP's packets
- we can actually ping in oposite direction - from the problematic VM to
the problematic "source" IP

We checked everything possible, from bridge port forwarding, to mac-to-vtep
mapping, to many other things, removed traffic shaping from VNET interface,
no iptables/ebtables, no STP on bridge, remove and rejoin interfaces to
bridge, destroy bridge and create manually on the fly,

Problem is really crazy, and I can not explain it - no iptables, no
ebtables for troubleshooting pruposes (on this host) and

We mitigate this problem by migrating VM away to another host, not much of
a solution...

This is Ubuntu 14.04, Qemu 2.5 (libvirt 1.3.1),
Stock kernel 3.16-xx, regular bridge (not OVS)

Anyone else ever heard of such problem - this is not intermittent packet
dropping, but complete blackout/packet drop in some way...

Thanks,

-- 

Andrija Panić

Re: Help/Advice needed - some traffic don't reach VNET / VM

Posted by Simon Weller <sw...@ena.com.INVALID>.

Andrija,


What is the guest OS for this VM, or does this issue not discriminate?

- Si
________________________________
From: Andrija Panic <an...@gmail.com>
Sent: Monday, October 9, 2017 3:52 PM
To: dev@cloudstack.apache.org; users@cloudstack.apache.org
Subject: Help/Advice needed - some traffic don't reach VNET / VM

Hi guys,

we have occasional but serious problem, that starts happening as it seems
randomly (i.e. NOT under high load)  - not ACS related afaik, purely KVM,
but feedback is really welcomed.

- VM is reachable in general from everywhere, but not reachable from
specific IP address ?!
- VM is NOT under high load, network traffic next to zero, same for
CPU/disk...
- We mitigate this problem by migrating VM away to another host, not much
of a solution...

Description of problem:

We let ping from "problematic" source IP address to the problematic VM, and
we capture traffic on KVM host where the problematic VM lives:

- Tcpdump on VXLAN interface (physical incoming interface on the host) - we
see packet fine
- tcpdump on BRIDGE = we see packet fine
- tcpdump on VNET = we DON'T see packet.

In the scenario above, I need to say that :
- we can tcpdump packets from other source IPs on the VNET interface just
fine (as expected), so should also see this problematic source IP's packets
- we can actually ping in oposite direction - from the problematic VM to
the problematic "source" IP

We checked everything possible, from bridge port forwarding, to mac-to-vtep
mapping, to many other things, removed traffic shaping from VNET interface,
no iptables/ebtables, no STP on bridge, remove and rejoin interfaces to
bridge, destroy bridge and create manually on the fly,

Problem is really crazy, and I can not explain it - no iptables, no
ebtables for troubleshooting pruposes (on this host) and

We mitigate this problem by migrating VM away to another host, not much of
a solution...

This is Ubuntu 14.04, Qemu 2.5 (libvirt 1.3.1),
Stock kernel 3.16-xx, regular bridge (not OVS)

Anyone else ever heard of such problem - this is not intermittent packet
dropping, but complete blackout/packet drop in some way...

Thanks,

--

Andrija Panić

Re: Help/Advice needed - some traffic don't reach VNET / VM

Posted by Simon Weller <sw...@ena.com.INVALID>.

Andrija,


What is the guest OS for this VM, or does this issue not discriminate?

- Si
________________________________
From: Andrija Panic <an...@gmail.com>
Sent: Monday, October 9, 2017 3:52 PM
To: dev@cloudstack.apache.org; users@cloudstack.apache.org
Subject: Help/Advice needed - some traffic don't reach VNET / VM

Hi guys,

we have occasional but serious problem, that starts happening as it seems
randomly (i.e. NOT under high load)  - not ACS related afaik, purely KVM,
but feedback is really welcomed.

- VM is reachable in general from everywhere, but not reachable from
specific IP address ?!
- VM is NOT under high load, network traffic next to zero, same for
CPU/disk...
- We mitigate this problem by migrating VM away to another host, not much
of a solution...

Description of problem:

We let ping from "problematic" source IP address to the problematic VM, and
we capture traffic on KVM host where the problematic VM lives:

- Tcpdump on VXLAN interface (physical incoming interface on the host) - we
see packet fine
- tcpdump on BRIDGE = we see packet fine
- tcpdump on VNET = we DON'T see packet.

In the scenario above, I need to say that :
- we can tcpdump packets from other source IPs on the VNET interface just
fine (as expected), so should also see this problematic source IP's packets
- we can actually ping in oposite direction - from the problematic VM to
the problematic "source" IP

We checked everything possible, from bridge port forwarding, to mac-to-vtep
mapping, to many other things, removed traffic shaping from VNET interface,
no iptables/ebtables, no STP on bridge, remove and rejoin interfaces to
bridge, destroy bridge and create manually on the fly,

Problem is really crazy, and I can not explain it - no iptables, no
ebtables for troubleshooting pruposes (on this host) and

We mitigate this problem by migrating VM away to another host, not much of
a solution...

This is Ubuntu 14.04, Qemu 2.5 (libvirt 1.3.1),
Stock kernel 3.16-xx, regular bridge (not OVS)

Anyone else ever heard of such problem - this is not intermittent packet
dropping, but complete blackout/packet drop in some way...

Thanks,

--

Andrija Panić

Re: Help/Advice needed - some traffic don't reach VNET / VM

Posted by Wei ZHOU <us...@gmail.com>.

Hi Andrija,

Good to see your update and know you found the root cause.

-Wei

2017-10-13 22:16 GMT+02:00 Andrija Panic <an...@gmail.com>:

> Hi all,
>
> I feel obligated to share update, to close the issue:
>
> Nothing to do with kernel/qemu etc.. Seem that hidden Docker NAT/Masquerade
> rules don't play nice with VNET...
>
> Description of the problem as given originally still is valid, but root
> cause is as above...
>
> Apologies for wasting everyone's time and thanks for all the inputs.
>
> Andrija
>
> On 10 October 2017 at 12:18, Wei ZHOU <us...@gmail.com> wrote:
>
> > Andrija,
> >
> > We had similar issue before. However, we use advanced zone with security
> > groups, and the issue is because some security groups rules (iptables
> > rules) are not applied by security_group.py successfully.
> > is there any iptables rules on the hypervisors ?
> >
> > -Wei
> >
> > 2017-10-10 11:23 GMT+02:00 Andrija Panic <an...@gmail.com>:
> >
> > > Hi,
> > >
> > > @Wei, no we are using VXLAN, advanced networking... problem is that
> > packet
> > > not passed from bridge to the VNET - that is "all"...
> > >
> > > @Ivan, we did upgrade few hosts to kernel, 4.4 (made available from
> > Ubuntu
> > > 16.04 to Ubuntu 14.04), but again we there had some issues with FortiOS
> > > (some special OS, not Linux based as I was told), that RDP apps behind
> > this
> > > FW are "slow" (probably laggy), when this FortiGate VM is on new
> > kernel...
> > >
> > > But I'm sure we will move to 4.4, this bug is really driving me
> crazy...
> > :(
> > >
> > > THx
> > >
> > > On 10 October 2017 at 09:52, Ivan Kudryavtsev <
> kudryavtsev_ia@bw-sw.com>
> > > wrote:
> > >
> > > > Andrija, I saw it in the past. Problem might be coolnnected with
> kernel
> > > > version and vnet itself. Try to look for it. I don't remember how we
> > > > overcame it in the past...
> > > >
> > > > 10 окт. 2017 г. 8:07 ДП пользователь "Wei ZHOU" <
> ustcweizhou@gmail.com
> > >
> > > > написал:
> > > >
> > > > > Hi Andrija,
> > > > >
> > > > > Are using advanced zone with isolated network or security groups ?
> > > > >
> > > > > -Wei
> > > > >
> > > > >
> > > > > 2017-10-09 22:52 GMT+02:00 Andrija Panic <andrija.panic@gmail.com
> >:
> > > > >
> > > > > > Hi guys,
> > > > > >
> > > > > > we have occasional but serious problem, that starts happening as
> it
> > > > seems
> > > > > > randomly (i.e. NOT under high load)  - not ACS related afaik,
> > purely
> > > > KVM,
> > > > > > but feedback is really welcomed.
> > > > > >
> > > > > > - VM is reachable in general from everywhere, but not reachable
> > from
> > > > > > specific IP address ?!
> > > > > > - VM is NOT under high load, network traffic next to zero, same
> for
> > > > > > CPU/disk...
> > > > > > - We mitigate this problem by migrating VM away to another host,
> > not
> > > > much
> > > > > > of a solution...
> > > > > >
> > > > > > Description of problem:
> > > > > >
> > > > > > We let ping from "problematic" source IP address to the
> problematic
> > > VM,
> > > > > and
> > > > > > we capture traffic on KVM host where the problematic VM lives:
> > > > > >
> > > > > > - Tcpdump on VXLAN interface (physical incoming interface on the
> > > host)
> > > > -
> > > > > we
> > > > > > see packet fine
> > > > > > - tcpdump on BRIDGE = we see packet fine
> > > > > > - tcpdump on VNET = we DON'T see packet.
> > > > > >
> > > > > > In the scenario above, I need to say that :
> > > > > > - we can tcpdump packets from other source IPs on the VNET
> > interface
> > > > just
> > > > > > fine (as expected), so should also see this problematic source
> IP's
> > > > > packets
> > > > > > - we can actually ping in oposite direction - from the
> problematic
> > VM
> > > > to
> > > > > > the problematic "source" IP
> > > > > >
> > > > > > We checked everything possible, from bridge port forwarding, to
> > > > > mac-to-vtep
> > > > > > mapping, to many other things, removed traffic shaping from VNET
> > > > > interface,
> > > > > > no iptables/ebtables, no STP on bridge, remove and rejoin
> > interfaces
> > > to
> > > > > > bridge, destroy bridge and create manually on the fly,
> > > > > >
> > > > > > Problem is really crazy, and I can not explain it - no iptables,
> no
> > > > > > ebtables for troubleshooting pruposes (on this host) and
> > > > > >
> > > > > > We mitigate this problem by migrating VM away to another host,
> not
> > > much
> > > > > of
> > > > > > a solution...
> > > > > >
> > > > > > This is Ubuntu 14.04, Qemu 2.5 (libvirt 1.3.1),
> > > > > > Stock kernel 3.16-xx, regular bridge (not OVS)
> > > > > >
> > > > > > Anyone else ever heard of such problem - this is not intermittent
> > > > packet
> > > > > > dropping, but complete blackout/packet drop in some way...
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Andrija Panić
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Andrija Panić
> > >
> >
>
>
>
> --
>
> Andrija Panić
>

Re: Help/Advice needed - some traffic don't reach VNET / VM

Posted by Andrija Panic <an...@gmail.com>.

Hi all,

I feel obligated to share update, to close the issue:

Nothing to do with kernel/qemu etc.. Seem that hidden Docker NAT/Masquerade
rules don't play nice with VNET...

Description of the problem as given originally still is valid, but root
cause is as above...

Apologies for wasting everyone's time and thanks for all the inputs.

Andrija

On 10 October 2017 at 12:18, Wei ZHOU <us...@gmail.com> wrote:

> Andrija,
>
> We had similar issue before. However, we use advanced zone with security
> groups, and the issue is because some security groups rules (iptables
> rules) are not applied by security_group.py successfully.
> is there any iptables rules on the hypervisors ?
>
> -Wei
>
> 2017-10-10 11:23 GMT+02:00 Andrija Panic <an...@gmail.com>:
>
> > Hi,
> >
> > @Wei, no we are using VXLAN, advanced networking... problem is that
> packet
> > not passed from bridge to the VNET - that is "all"...
> >
> > @Ivan, we did upgrade few hosts to kernel, 4.4 (made available from
> Ubuntu
> > 16.04 to Ubuntu 14.04), but again we there had some issues with FortiOS
> > (some special OS, not Linux based as I was told), that RDP apps behind
> this
> > FW are "slow" (probably laggy), when this FortiGate VM is on new
> kernel...
> >
> > But I'm sure we will move to 4.4, this bug is really driving me crazy...
> :(
> >
> > THx
> >
> > On 10 October 2017 at 09:52, Ivan Kudryavtsev <ku...@bw-sw.com>
> > wrote:
> >
> > > Andrija, I saw it in the past. Problem might be coolnnected with kernel
> > > version and vnet itself. Try to look for it. I don't remember how we
> > > overcame it in the past...
> > >
> > > 10 окт. 2017 г. 8:07 ДП пользователь "Wei ZHOU" <ustcweizhou@gmail.com
> >
> > > написал:
> > >
> > > > Hi Andrija,
> > > >
> > > > Are using advanced zone with isolated network or security groups ?
> > > >
> > > > -Wei
> > > >
> > > >
> > > > 2017-10-09 22:52 GMT+02:00 Andrija Panic <an...@gmail.com>:
> > > >
> > > > > Hi guys,
> > > > >
> > > > > we have occasional but serious problem, that starts happening as it
> > > seems
> > > > > randomly (i.e. NOT under high load)  - not ACS related afaik,
> purely
> > > KVM,
> > > > > but feedback is really welcomed.
> > > > >
> > > > > - VM is reachable in general from everywhere, but not reachable
> from
> > > > > specific IP address ?!
> > > > > - VM is NOT under high load, network traffic next to zero, same for
> > > > > CPU/disk...
> > > > > - We mitigate this problem by migrating VM away to another host,
> not
> > > much
> > > > > of a solution...
> > > > >
> > > > > Description of problem:
> > > > >
> > > > > We let ping from "problematic" source IP address to the problematic
> > VM,
> > > > and
> > > > > we capture traffic on KVM host where the problematic VM lives:
> > > > >
> > > > > - Tcpdump on VXLAN interface (physical incoming interface on the
> > host)
> > > -
> > > > we
> > > > > see packet fine
> > > > > - tcpdump on BRIDGE = we see packet fine
> > > > > - tcpdump on VNET = we DON'T see packet.
> > > > >
> > > > > In the scenario above, I need to say that :
> > > > > - we can tcpdump packets from other source IPs on the VNET
> interface
> > > just
> > > > > fine (as expected), so should also see this problematic source IP's
> > > > packets
> > > > > - we can actually ping in oposite direction - from the problematic
> VM
> > > to
> > > > > the problematic "source" IP
> > > > >
> > > > > We checked everything possible, from bridge port forwarding, to
> > > > mac-to-vtep
> > > > > mapping, to many other things, removed traffic shaping from VNET
> > > > interface,
> > > > > no iptables/ebtables, no STP on bridge, remove and rejoin
> interfaces
> > to
> > > > > bridge, destroy bridge and create manually on the fly,
> > > > >
> > > > > Problem is really crazy, and I can not explain it - no iptables, no
> > > > > ebtables for troubleshooting pruposes (on this host) and
> > > > >
> > > > > We mitigate this problem by migrating VM away to another host, not
> > much
> > > > of
> > > > > a solution...
> > > > >
> > > > > This is Ubuntu 14.04, Qemu 2.5 (libvirt 1.3.1),
> > > > > Stock kernel 3.16-xx, regular bridge (not OVS)
> > > > >
> > > > > Anyone else ever heard of such problem - this is not intermittent
> > > packet
> > > > > dropping, but complete blackout/packet drop in some way...
> > > > >
> > > > > Thanks,
> > > > >
> > > > > --
> > > > >
> > > > > Andrija Panić
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> >
> > Andrija Panić
> >
>



-- 

Andrija Panić

Re: Help/Advice needed - some traffic don't reach VNET / VM

Posted by Wei ZHOU <us...@gmail.com>.

Andrija,

We had similar issue before. However, we use advanced zone with security
groups, and the issue is because some security groups rules (iptables
rules) are not applied by security_group.py successfully.
is there any iptables rules on the hypervisors ?

-Wei

2017-10-10 11:23 GMT+02:00 Andrija Panic <an...@gmail.com>:

> Hi,
>
> @Wei, no we are using VXLAN, advanced networking... problem is that packet
> not passed from bridge to the VNET - that is "all"...
>
> @Ivan, we did upgrade few hosts to kernel, 4.4 (made available from Ubuntu
> 16.04 to Ubuntu 14.04), but again we there had some issues with FortiOS
> (some special OS, not Linux based as I was told), that RDP apps behind this
> FW are "slow" (probably laggy), when this FortiGate VM is on new kernel...
>
> But I'm sure we will move to 4.4, this bug is really driving me crazy... :(
>
> THx
>
> On 10 October 2017 at 09:52, Ivan Kudryavtsev <ku...@bw-sw.com>
> wrote:
>
> > Andrija, I saw it in the past. Problem might be coolnnected with kernel
> > version and vnet itself. Try to look for it. I don't remember how we
> > overcame it in the past...
> >
> > 10 окт. 2017 г. 8:07 ДП пользователь "Wei ZHOU" <us...@gmail.com>
> > написал:
> >
> > > Hi Andrija,
> > >
> > > Are using advanced zone with isolated network or security groups ?
> > >
> > > -Wei
> > >
> > >
> > > 2017-10-09 22:52 GMT+02:00 Andrija Panic <an...@gmail.com>:
> > >
> > > > Hi guys,
> > > >
> > > > we have occasional but serious problem, that starts happening as it
> > seems
> > > > randomly (i.e. NOT under high load)  - not ACS related afaik, purely
> > KVM,
> > > > but feedback is really welcomed.
> > > >
> > > > - VM is reachable in general from everywhere, but not reachable from
> > > > specific IP address ?!
> > > > - VM is NOT under high load, network traffic next to zero, same for
> > > > CPU/disk...
> > > > - We mitigate this problem by migrating VM away to another host, not
> > much
> > > > of a solution...
> > > >
> > > > Description of problem:
> > > >
> > > > We let ping from "problematic" source IP address to the problematic
> VM,
> > > and
> > > > we capture traffic on KVM host where the problematic VM lives:
> > > >
> > > > - Tcpdump on VXLAN interface (physical incoming interface on the
> host)
> > -
> > > we
> > > > see packet fine
> > > > - tcpdump on BRIDGE = we see packet fine
> > > > - tcpdump on VNET = we DON'T see packet.
> > > >
> > > > In the scenario above, I need to say that :
> > > > - we can tcpdump packets from other source IPs on the VNET interface
> > just
> > > > fine (as expected), so should also see this problematic source IP's
> > > packets
> > > > - we can actually ping in oposite direction - from the problematic VM
> > to
> > > > the problematic "source" IP
> > > >
> > > > We checked everything possible, from bridge port forwarding, to
> > > mac-to-vtep
> > > > mapping, to many other things, removed traffic shaping from VNET
> > > interface,
> > > > no iptables/ebtables, no STP on bridge, remove and rejoin interfaces
> to
> > > > bridge, destroy bridge and create manually on the fly,
> > > >
> > > > Problem is really crazy, and I can not explain it - no iptables, no
> > > > ebtables for troubleshooting pruposes (on this host) and
> > > >
> > > > We mitigate this problem by migrating VM away to another host, not
> much
> > > of
> > > > a solution...
> > > >
> > > > This is Ubuntu 14.04, Qemu 2.5 (libvirt 1.3.1),
> > > > Stock kernel 3.16-xx, regular bridge (not OVS)
> > > >
> > > > Anyone else ever heard of such problem - this is not intermittent
> > packet
> > > > dropping, but complete blackout/packet drop in some way...
> > > >
> > > > Thanks,
> > > >
> > > > --
> > > >
> > > > Andrija Panić
> > > >
> > >
> >
>
>
>
> --
>
> Andrija Panić
>

Re: Help/Advice needed - some traffic don't reach VNET / VM

Posted by Andrija Panic <an...@gmail.com>.

Hi,

@Wei, no we are using VXLAN, advanced networking... problem is that packet
not passed from bridge to the VNET - that is "all"...

@Ivan, we did upgrade few hosts to kernel, 4.4 (made available from Ubuntu
16.04 to Ubuntu 14.04), but again we there had some issues with FortiOS
(some special OS, not Linux based as I was told), that RDP apps behind this
FW are "slow" (probably laggy), when this FortiGate VM is on new kernel...

But I'm sure we will move to 4.4, this bug is really driving me crazy... :(

THx

On 10 October 2017 at 09:52, Ivan Kudryavtsev <ku...@bw-sw.com>
wrote:

> Andrija, I saw it in the past. Problem might be coolnnected with kernel
> version and vnet itself. Try to look for it. I don't remember how we
> overcame it in the past...
>
> 10 окт. 2017 г. 8:07 ДП пользователь "Wei ZHOU" <us...@gmail.com>
> написал:
>
> > Hi Andrija,
> >
> > Are using advanced zone with isolated network or security groups ?
> >
> > -Wei
> >
> >
> > 2017-10-09 22:52 GMT+02:00 Andrija Panic <an...@gmail.com>:
> >
> > > Hi guys,
> > >
> > > we have occasional but serious problem, that starts happening as it
> seems
> > > randomly (i.e. NOT under high load)  - not ACS related afaik, purely
> KVM,
> > > but feedback is really welcomed.
> > >
> > > - VM is reachable in general from everywhere, but not reachable from
> > > specific IP address ?!
> > > - VM is NOT under high load, network traffic next to zero, same for
> > > CPU/disk...
> > > - We mitigate this problem by migrating VM away to another host, not
> much
> > > of a solution...
> > >
> > > Description of problem:
> > >
> > > We let ping from "problematic" source IP address to the problematic VM,
> > and
> > > we capture traffic on KVM host where the problematic VM lives:
> > >
> > > - Tcpdump on VXLAN interface (physical incoming interface on the host)
> -
> > we
> > > see packet fine
> > > - tcpdump on BRIDGE = we see packet fine
> > > - tcpdump on VNET = we DON'T see packet.
> > >
> > > In the scenario above, I need to say that :
> > > - we can tcpdump packets from other source IPs on the VNET interface
> just
> > > fine (as expected), so should also see this problematic source IP's
> > packets
> > > - we can actually ping in oposite direction - from the problematic VM
> to
> > > the problematic "source" IP
> > >
> > > We checked everything possible, from bridge port forwarding, to
> > mac-to-vtep
> > > mapping, to many other things, removed traffic shaping from VNET
> > interface,
> > > no iptables/ebtables, no STP on bridge, remove and rejoin interfaces to
> > > bridge, destroy bridge and create manually on the fly,
> > >
> > > Problem is really crazy, and I can not explain it - no iptables, no
> > > ebtables for troubleshooting pruposes (on this host) and
> > >
> > > We mitigate this problem by migrating VM away to another host, not much
> > of
> > > a solution...
> > >
> > > This is Ubuntu 14.04, Qemu 2.5 (libvirt 1.3.1),
> > > Stock kernel 3.16-xx, regular bridge (not OVS)
> > >
> > > Anyone else ever heard of such problem - this is not intermittent
> packet
> > > dropping, but complete blackout/packet drop in some way...
> > >
> > > Thanks,
> > >
> > > --
> > >
> > > Andrija Panić
> > >
> >
>



-- 

Andrija Panić

Re: Help/Advice needed - some traffic don't reach VNET / VM

Posted by Ivan Kudryavtsev <ku...@bw-sw.com>.

Andrija, I saw it in the past. Problem might be coolnnected with kernel
version and vnet itself. Try to look for it. I don't remember how we
overcame it in the past...

10 окт. 2017 г. 8:07 ДП пользователь "Wei ZHOU" <us...@gmail.com>
написал:

> Hi Andrija,
>
> Are using advanced zone with isolated network or security groups ?
>
> -Wei
>
>
> 2017-10-09 22:52 GMT+02:00 Andrija Panic <an...@gmail.com>:
>
> > Hi guys,
> >
> > we have occasional but serious problem, that starts happening as it seems
> > randomly (i.e. NOT under high load)  - not ACS related afaik, purely KVM,
> > but feedback is really welcomed.
> >
> > - VM is reachable in general from everywhere, but not reachable from
> > specific IP address ?!
> > - VM is NOT under high load, network traffic next to zero, same for
> > CPU/disk...
> > - We mitigate this problem by migrating VM away to another host, not much
> > of a solution...
> >
> > Description of problem:
> >
> > We let ping from "problematic" source IP address to the problematic VM,
> and
> > we capture traffic on KVM host where the problematic VM lives:
> >
> > - Tcpdump on VXLAN interface (physical incoming interface on the host) -
> we
> > see packet fine
> > - tcpdump on BRIDGE = we see packet fine
> > - tcpdump on VNET = we DON'T see packet.
> >
> > In the scenario above, I need to say that :
> > - we can tcpdump packets from other source IPs on the VNET interface just
> > fine (as expected), so should also see this problematic source IP's
> packets
> > - we can actually ping in oposite direction - from the problematic VM to
> > the problematic "source" IP
> >
> > We checked everything possible, from bridge port forwarding, to
> mac-to-vtep
> > mapping, to many other things, removed traffic shaping from VNET
> interface,
> > no iptables/ebtables, no STP on bridge, remove and rejoin interfaces to
> > bridge, destroy bridge and create manually on the fly,
> >
> > Problem is really crazy, and I can not explain it - no iptables, no
> > ebtables for troubleshooting pruposes (on this host) and
> >
> > We mitigate this problem by migrating VM away to another host, not much
> of
> > a solution...
> >
> > This is Ubuntu 14.04, Qemu 2.5 (libvirt 1.3.1),
> > Stock kernel 3.16-xx, regular bridge (not OVS)
> >
> > Anyone else ever heard of such problem - this is not intermittent packet
> > dropping, but complete blackout/packet drop in some way...
> >
> > Thanks,
> >
> > --
> >
> > Andrija Panić
> >
>

Re: Help/Advice needed - some traffic don't reach VNET / VM

Posted by Wei ZHOU <us...@gmail.com>.

Hi Andrija,

Are using advanced zone with isolated network or security groups ?

-Wei


2017-10-09 22:52 GMT+02:00 Andrija Panic <an...@gmail.com>:

> Hi guys,
>
> we have occasional but serious problem, that starts happening as it seems
> randomly (i.e. NOT under high load)  - not ACS related afaik, purely KVM,
> but feedback is really welcomed.
>
> - VM is reachable in general from everywhere, but not reachable from
> specific IP address ?!
> - VM is NOT under high load, network traffic next to zero, same for
> CPU/disk...
> - We mitigate this problem by migrating VM away to another host, not much
> of a solution...
>
> Description of problem:
>
> We let ping from "problematic" source IP address to the problematic VM, and
> we capture traffic on KVM host where the problematic VM lives:
>
> - Tcpdump on VXLAN interface (physical incoming interface on the host) - we
> see packet fine
> - tcpdump on BRIDGE = we see packet fine
> - tcpdump on VNET = we DON'T see packet.
>
> In the scenario above, I need to say that :
> - we can tcpdump packets from other source IPs on the VNET interface just
> fine (as expected), so should also see this problematic source IP's packets
> - we can actually ping in oposite direction - from the problematic VM to
> the problematic "source" IP
>
> We checked everything possible, from bridge port forwarding, to mac-to-vtep
> mapping, to many other things, removed traffic shaping from VNET interface,
> no iptables/ebtables, no STP on bridge, remove and rejoin interfaces to
> bridge, destroy bridge and create manually on the fly,
>
> Problem is really crazy, and I can not explain it - no iptables, no
> ebtables for troubleshooting pruposes (on this host) and
>
> We mitigate this problem by migrating VM away to another host, not much of
> a solution...
>
> This is Ubuntu 14.04, Qemu 2.5 (libvirt 1.3.1),
> Stock kernel 3.16-xx, regular bridge (not OVS)
>
> Anyone else ever heard of such problem - this is not intermittent packet
> dropping, but complete blackout/packet drop in some way...
>
> Thanks,
>
> --
>
> Andrija Panić
>

RE: Help/Advice needed - some traffic don't reach VNET / VM

Posted by Andrija Panic <an...@gmail.com>.

Hi Imran,

Thx for the input, but we are using advanced zone, and guest traffic is
private one...IP is not duplicated.

Thx

On Oct 10, 2017 07:27, "Imran Ahmed" <im...@eaxiom.net> wrote:

> Hi Andrija,
>
> One more thing you can check is that see if there is an IP address
> conflict somewhere.  Please see if the public IP pool assigned to guest VMs
> by CS is somehow overlapping with any IPs which are assigned to physical
> /virtual machines somewhere.
>
> Kind regards,
>
> Imran
>
> -----Original Message-----
> From: Andrija Panic [mailto:andrija.panic@gmail.com]
> Sent: Tuesday, October 10, 2017 2:37 AM
> To: users@cloudstack.apache.org
> Cc: dev@cloudstack.apache.org
> Subject: Re: Help/Advice needed - some traffic don't reach VNET / VM
>
> Hi guys,
>
> thanks for quick reply:
>
> - VM issue happens on Windows mostly (one customer is of particularly bad
> luck as it seems), but afaik also happens on Linux, and FortiOS (some FW
> stuff, not pure linux) - both are running PV stuff (Windows PV, or CentOS
> 6.5 x64 OS type)
> - we are actually using LACP on switches, and I also disabled/down one bond
> interface on the host - although it makes zero sense, because packer
> already arrived via this bond0, to te bond0.XXX vlan (VTEP), then packet
> also arrived to child vxlan interface (vxlan on top of vlan on top of
> bond...) and then packet also arrived to bridge, but was never passed to
> VNET.
>
> My expectation is that this is purely inside-the-host problem, since packet
> arrives from outside physical network to the host's vxlan/bridge...but not
> to VNET.
> Seems like some qemu issue, but I found zero things using google, that
> looks similar like our issue.
>
> Have no idea...
>
>
>
> On 9 October 2017 at 23:08, Dag Sonstebo <Da...@shapeblue.com>
> wrote:
>
> > Hi Andrija,
> >
> > Do you use NIC bonds? I have seen this before when using active-active
> > bonds, and as you say it can be very difficult to troubleshoot and the
> > behaviour makes little sense. What can happen is network traffic is load
> > balanced between the two NICs, however the update frequency of the MAC
> > tables between the two switches don’t keep up with the load balanced
> > traffic. In other words a MAC address which used to transmit on
> hypervisor
> > eth0 (attached to your first top of rack switch) of a bond has suddenly
> due
> > to load started transmitting on eth1 (attached to the second of the top
> of
> > rack switches) of the bond, however the physical switch stack still
> thinks
> > the MAC address lives on eth0, hence traffic is dropped until next time
> the
> > switches synch MAC tables.
> >
> > We used to see this a lot in the past on XenServer – the solution being
> > moving to active-passive bond modes, or go up to LACP/802.3ad if your
> > hardware allows for it. The same principle will however also apply on
> > generic linux bonds.
> >
> > Regards,
> > Dag Sonstebo
> > Cloud Architect
> > ShapeBlue
> >  S: +44 20 3603 0540  | dag.sonstebo@shapeblue.com |
> > http://www.shapeblue.com <http://www.shapeblue.com/> |
> Twitter:@ShapeBlue
> > <https://twitter.com/#!/shapeblue>
> >
> >
> > On 09/10/2017, 21:52, "Andrija Panic" <an...@gmail.com> wrote:
> >
> >     Hi guys,
> >
> >     we have occasional but serious problem, that starts happening as it
> > seems
> >     randomly (i.e. NOT under high load)  - not ACS related afaik, purely
> > KVM,
> >     but feedback is really welcomed.
> >
> >     - VM is reachable in general from everywhere, but not reachable from
> >     specific IP address ?!
> >     - VM is NOT under high load, network traffic next to zero, same for
> >     CPU/disk...
> >     - We mitigate this problem by migrating VM away to another host, not
> > much
> >     of a solution...
> >
> >     Description of problem:
> >
> >     We let ping from "problematic" source IP address to the problematic
> > VM, and
> >     we capture traffic on KVM host where the problematic VM lives:
> >
> >     - Tcpdump on VXLAN interface (physical incoming interface on the
> host)
> > - we
> >     see packet fine
> >     - tcpdump on BRIDGE = we see packet fine
> >     - tcpdump on VNET = we DON'T see packet.
> >
> >     In the scenario above, I need to say that :
> >     - we can tcpdump packets from other source IPs on the VNET interface
> > just
> >     fine (as expected), so should also see this problematic source IP's
> > packets
> >     - we can actually ping in oposite direction - from the problematic VM
> > to
> >     the problematic "source" IP
> >
> >     We checked everything possible, from bridge port forwarding, to
> > mac-to-vtep
> >     mapping, to many other things, removed traffic shaping from VNET
> > interface,
> >     no iptables/ebtables, no STP on bridge, remove and rejoin interfaces
> to
> >     bridge, destroy bridge and create manually on the fly,
> >
> >     Problem is really crazy, and I can not explain it - no iptables, no
> >     ebtables for troubleshooting pruposes (on this host) and
> >
> >     We mitigate this problem by migrating VM away to another host, not
> > much of
> >     a solution...
> >
> >     This is Ubuntu 14.04, Qemu 2.5 (libvirt 1.3.1),
> >     Stock kernel 3.16-xx, regular bridge (not OVS)
> >
> >     Anyone else ever heard of such problem - this is not intermittent
> > packet
> >     dropping, but complete blackout/packet drop in some way...
> >
> >     Thanks,
> >
> >     --
> >
> >     Andrija Panić
> >
> >
> >
> > Dag.Sonstebo@shapeblue.com
> > www.shapeblue.com
> > 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> > @shapeblue
> >
> >
> >
> >
>
>
> --
>
> Andrija Panić
>
>

RE: Help/Advice needed - some traffic don't reach VNET / VM

Posted by Andrija Panic <an...@gmail.com>.

Hi Imran,

Thx for the input, but we are using advanced zone, and guest traffic is
private one...IP is not duplicated.

Thx

On Oct 10, 2017 07:27, "Imran Ahmed" <im...@eaxiom.net> wrote:

> Hi Andrija,
>
> One more thing you can check is that see if there is an IP address
> conflict somewhere.  Please see if the public IP pool assigned to guest VMs
> by CS is somehow overlapping with any IPs which are assigned to physical
> /virtual machines somewhere.
>
> Kind regards,
>
> Imran
>
> -----Original Message-----
> From: Andrija Panic [mailto:andrija.panic@gmail.com]
> Sent: Tuesday, October 10, 2017 2:37 AM
> To: users@cloudstack.apache.org
> Cc: dev@cloudstack.apache.org
> Subject: Re: Help/Advice needed - some traffic don't reach VNET / VM
>
> Hi guys,
>
> thanks for quick reply:
>
> - VM issue happens on Windows mostly (one customer is of particularly bad
> luck as it seems), but afaik also happens on Linux, and FortiOS (some FW
> stuff, not pure linux) - both are running PV stuff (Windows PV, or CentOS
> 6.5 x64 OS type)
> - we are actually using LACP on switches, and I also disabled/down one bond
> interface on the host - although it makes zero sense, because packer
> already arrived via this bond0, to te bond0.XXX vlan (VTEP), then packet
> also arrived to child vxlan interface (vxlan on top of vlan on top of
> bond...) and then packet also arrived to bridge, but was never passed to
> VNET.
>
> My expectation is that this is purely inside-the-host problem, since packet
> arrives from outside physical network to the host's vxlan/bridge...but not
> to VNET.
> Seems like some qemu issue, but I found zero things using google, that
> looks similar like our issue.
>
> Have no idea...
>
>
>
> On 9 October 2017 at 23:08, Dag Sonstebo <Da...@shapeblue.com>
> wrote:
>
> > Hi Andrija,
> >
> > Do you use NIC bonds? I have seen this before when using active-active
> > bonds, and as you say it can be very difficult to troubleshoot and the
> > behaviour makes little sense. What can happen is network traffic is load
> > balanced between the two NICs, however the update frequency of the MAC
> > tables between the two switches don’t keep up with the load balanced
> > traffic. In other words a MAC address which used to transmit on
> hypervisor
> > eth0 (attached to your first top of rack switch) of a bond has suddenly
> due
> > to load started transmitting on eth1 (attached to the second of the top
> of
> > rack switches) of the bond, however the physical switch stack still
> thinks
> > the MAC address lives on eth0, hence traffic is dropped until next time
> the
> > switches synch MAC tables.
> >
> > We used to see this a lot in the past on XenServer – the solution being
> > moving to active-passive bond modes, or go up to LACP/802.3ad if your
> > hardware allows for it. The same principle will however also apply on
> > generic linux bonds.
> >
> > Regards,
> > Dag Sonstebo
> > Cloud Architect
> > ShapeBlue
> >  S: +44 20 3603 0540  | dag.sonstebo@shapeblue.com |
> > http://www.shapeblue.com <http://www.shapeblue.com/> |
> Twitter:@ShapeBlue
> > <https://twitter.com/#!/shapeblue>
> >
> >
> > On 09/10/2017, 21:52, "Andrija Panic" <an...@gmail.com> wrote:
> >
> >     Hi guys,
> >
> >     we have occasional but serious problem, that starts happening as it
> > seems
> >     randomly (i.e. NOT under high load)  - not ACS related afaik, purely
> > KVM,
> >     but feedback is really welcomed.
> >
> >     - VM is reachable in general from everywhere, but not reachable from
> >     specific IP address ?!
> >     - VM is NOT under high load, network traffic next to zero, same for
> >     CPU/disk...
> >     - We mitigate this problem by migrating VM away to another host, not
> > much
> >     of a solution...
> >
> >     Description of problem:
> >
> >     We let ping from "problematic" source IP address to the problematic
> > VM, and
> >     we capture traffic on KVM host where the problematic VM lives:
> >
> >     - Tcpdump on VXLAN interface (physical incoming interface on the
> host)
> > - we
> >     see packet fine
> >     - tcpdump on BRIDGE = we see packet fine
> >     - tcpdump on VNET = we DON'T see packet.
> >
> >     In the scenario above, I need to say that :
> >     - we can tcpdump packets from other source IPs on the VNET interface
> > just
> >     fine (as expected), so should also see this problematic source IP's
> > packets
> >     - we can actually ping in oposite direction - from the problematic VM
> > to
> >     the problematic "source" IP
> >
> >     We checked everything possible, from bridge port forwarding, to
> > mac-to-vtep
> >     mapping, to many other things, removed traffic shaping from VNET
> > interface,
> >     no iptables/ebtables, no STP on bridge, remove and rejoin interfaces
> to
> >     bridge, destroy bridge and create manually on the fly,
> >
> >     Problem is really crazy, and I can not explain it - no iptables, no
> >     ebtables for troubleshooting pruposes (on this host) and
> >
> >     We mitigate this problem by migrating VM away to another host, not
> > much of
> >     a solution...
> >
> >     This is Ubuntu 14.04, Qemu 2.5 (libvirt 1.3.1),
> >     Stock kernel 3.16-xx, regular bridge (not OVS)
> >
> >     Anyone else ever heard of such problem - this is not intermittent
> > packet
> >     dropping, but complete blackout/packet drop in some way...
> >
> >     Thanks,
> >
> >     --
> >
> >     Andrija Panić
> >
> >
> >
> > Dag.Sonstebo@shapeblue.com
> > www.shapeblue.com
> > 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> > @shapeblue
> >
> >
> >
> >
>
>
> --
>
> Andrija Panić
>
>

RE: Help/Advice needed - some traffic don't reach VNET / VM

Posted by Imran Ahmed <im...@eaxiom.net>.

Hi Andrija,

One more thing you can check is that see if there is an IP address conflict somewhere.  Please see if the public IP pool assigned to guest VMs by CS is somehow overlapping with any IPs which are assigned to physical /virtual machines somewhere. 

Kind regards,

Imran 

-----Original Message-----
From: Andrija Panic [mailto:andrija.panic@gmail.com] 
Sent: Tuesday, October 10, 2017 2:37 AM
To: users@cloudstack.apache.org
Cc: dev@cloudstack.apache.org
Subject: Re: Help/Advice needed - some traffic don't reach VNET / VM

Hi guys,

thanks for quick reply:

- VM issue happens on Windows mostly (one customer is of particularly bad
luck as it seems), but afaik also happens on Linux, and FortiOS (some FW
stuff, not pure linux) - both are running PV stuff (Windows PV, or CentOS
6.5 x64 OS type)
- we are actually using LACP on switches, and I also disabled/down one bond
interface on the host - although it makes zero sense, because packer
already arrived via this bond0, to te bond0.XXX vlan (VTEP), then packet
also arrived to child vxlan interface (vxlan on top of vlan on top of
bond...) and then packet also arrived to bridge, but was never passed to
VNET.

My expectation is that this is purely inside-the-host problem, since packet
arrives from outside physical network to the host's vxlan/bridge...but not
to VNET.
Seems like some qemu issue, but I found zero things using google, that
looks similar like our issue.

Have no idea...



On 9 October 2017 at 23:08, Dag Sonstebo <Da...@shapeblue.com> wrote:

> Hi Andrija,
>
> Do you use NIC bonds? I have seen this before when using active-active
> bonds, and as you say it can be very difficult to troubleshoot and the
> behaviour makes little sense. What can happen is network traffic is load
> balanced between the two NICs, however the update frequency of the MAC
> tables between the two switches don’t keep up with the load balanced
> traffic. In other words a MAC address which used to transmit on hypervisor
> eth0 (attached to your first top of rack switch) of a bond has suddenly due
> to load started transmitting on eth1 (attached to the second of the top of
> rack switches) of the bond, however the physical switch stack still thinks
> the MAC address lives on eth0, hence traffic is dropped until next time the
> switches synch MAC tables.
>
> We used to see this a lot in the past on XenServer – the solution being
> moving to active-passive bond modes, or go up to LACP/802.3ad if your
> hardware allows for it. The same principle will however also apply on
> generic linux bonds.
>
> Regards,
> Dag Sonstebo
> Cloud Architect
> ShapeBlue
>  S: +44 20 3603 0540  | dag.sonstebo@shapeblue.com |
> http://www.shapeblue.com <http://www.shapeblue.com/> | Twitter:@ShapeBlue
> <https://twitter.com/#!/shapeblue>
>
>
> On 09/10/2017, 21:52, "Andrija Panic" <an...@gmail.com> wrote:
>
>     Hi guys,
>
>     we have occasional but serious problem, that starts happening as it
> seems
>     randomly (i.e. NOT under high load)  - not ACS related afaik, purely
> KVM,
>     but feedback is really welcomed.
>
>     - VM is reachable in general from everywhere, but not reachable from
>     specific IP address ?!
>     - VM is NOT under high load, network traffic next to zero, same for
>     CPU/disk...
>     - We mitigate this problem by migrating VM away to another host, not
> much
>     of a solution...
>
>     Description of problem:
>
>     We let ping from "problematic" source IP address to the problematic
> VM, and
>     we capture traffic on KVM host where the problematic VM lives:
>
>     - Tcpdump on VXLAN interface (physical incoming interface on the host)
> - we
>     see packet fine
>     - tcpdump on BRIDGE = we see packet fine
>     - tcpdump on VNET = we DON'T see packet.
>
>     In the scenario above, I need to say that :
>     - we can tcpdump packets from other source IPs on the VNET interface
> just
>     fine (as expected), so should also see this problematic source IP's
> packets
>     - we can actually ping in oposite direction - from the problematic VM
> to
>     the problematic "source" IP
>
>     We checked everything possible, from bridge port forwarding, to
> mac-to-vtep
>     mapping, to many other things, removed traffic shaping from VNET
> interface,
>     no iptables/ebtables, no STP on bridge, remove and rejoin interfaces to
>     bridge, destroy bridge and create manually on the fly,
>
>     Problem is really crazy, and I can not explain it - no iptables, no
>     ebtables for troubleshooting pruposes (on this host) and
>
>     We mitigate this problem by migrating VM away to another host, not
> much of
>     a solution...
>
>     This is Ubuntu 14.04, Qemu 2.5 (libvirt 1.3.1),
>     Stock kernel 3.16-xx, regular bridge (not OVS)
>
>     Anyone else ever heard of such problem - this is not intermittent
> packet
>     dropping, but complete blackout/packet drop in some way...
>
>     Thanks,
>
>     --
>
>     Andrija Panić
>
>
>
> Dag.Sonstebo@shapeblue.com
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
>
>
>
>


-- 

Andrija Panić

Re: Help/Advice needed - some traffic don't reach VNET / VM

Posted by Andrija Panic <an...@gmail.com>.

Hi guys,

thanks for quick reply:

- VM issue happens on Windows mostly (one customer is of particularly bad
luck as it seems), but afaik also happens on Linux, and FortiOS (some FW
stuff, not pure linux) - both are running PV stuff (Windows PV, or CentOS
6.5 x64 OS type)
- we are actually using LACP on switches, and I also disabled/down one bond
interface on the host - although it makes zero sense, because packer
already arrived via this bond0, to te bond0.XXX vlan (VTEP), then packet
also arrived to child vxlan interface (vxlan on top of vlan on top of
bond...) and then packet also arrived to bridge, but was never passed to
VNET.

My expectation is that this is purely inside-the-host problem, since packet
arrives from outside physical network to the host's vxlan/bridge...but not
to VNET.
Seems like some qemu issue, but I found zero things using google, that
looks similar like our issue.

Have no idea...



On 9 October 2017 at 23:08, Dag Sonstebo <Da...@shapeblue.com> wrote:

> Hi Andrija,
>
> Do you use NIC bonds? I have seen this before when using active-active
> bonds, and as you say it can be very difficult to troubleshoot and the
> behaviour makes little sense. What can happen is network traffic is load
> balanced between the two NICs, however the update frequency of the MAC
> tables between the two switches don’t keep up with the load balanced
> traffic. In other words a MAC address which used to transmit on hypervisor
> eth0 (attached to your first top of rack switch) of a bond has suddenly due
> to load started transmitting on eth1 (attached to the second of the top of
> rack switches) of the bond, however the physical switch stack still thinks
> the MAC address lives on eth0, hence traffic is dropped until next time the
> switches synch MAC tables.
>
> We used to see this a lot in the past on XenServer – the solution being
> moving to active-passive bond modes, or go up to LACP/802.3ad if your
> hardware allows for it. The same principle will however also apply on
> generic linux bonds.
>
> Regards,
> Dag Sonstebo
> Cloud Architect
> ShapeBlue
>  S: +44 20 3603 0540  | dag.sonstebo@shapeblue.com |
> http://www.shapeblue.com <http://www.shapeblue.com/> | Twitter:@ShapeBlue
> <https://twitter.com/#!/shapeblue>
>
>
> On 09/10/2017, 21:52, "Andrija Panic" <an...@gmail.com> wrote:
>
>     Hi guys,
>
>     we have occasional but serious problem, that starts happening as it
> seems
>     randomly (i.e. NOT under high load)  - not ACS related afaik, purely
> KVM,
>     but feedback is really welcomed.
>
>     - VM is reachable in general from everywhere, but not reachable from
>     specific IP address ?!
>     - VM is NOT under high load, network traffic next to zero, same for
>     CPU/disk...
>     - We mitigate this problem by migrating VM away to another host, not
> much
>     of a solution...
>
>     Description of problem:
>
>     We let ping from "problematic" source IP address to the problematic
> VM, and
>     we capture traffic on KVM host where the problematic VM lives:
>
>     - Tcpdump on VXLAN interface (physical incoming interface on the host)
> - we
>     see packet fine
>     - tcpdump on BRIDGE = we see packet fine
>     - tcpdump on VNET = we DON'T see packet.
>
>     In the scenario above, I need to say that :
>     - we can tcpdump packets from other source IPs on the VNET interface
> just
>     fine (as expected), so should also see this problematic source IP's
> packets
>     - we can actually ping in oposite direction - from the problematic VM
> to
>     the problematic "source" IP
>
>     We checked everything possible, from bridge port forwarding, to
> mac-to-vtep
>     mapping, to many other things, removed traffic shaping from VNET
> interface,
>     no iptables/ebtables, no STP on bridge, remove and rejoin interfaces to
>     bridge, destroy bridge and create manually on the fly,
>
>     Problem is really crazy, and I can not explain it - no iptables, no
>     ebtables for troubleshooting pruposes (on this host) and
>
>     We mitigate this problem by migrating VM away to another host, not
> much of
>     a solution...
>
>     This is Ubuntu 14.04, Qemu 2.5 (libvirt 1.3.1),
>     Stock kernel 3.16-xx, regular bridge (not OVS)
>
>     Anyone else ever heard of such problem - this is not intermittent
> packet
>     dropping, but complete blackout/packet drop in some way...
>
>     Thanks,
>
>     --
>
>     Andrija Panić
>
>
>
> Dag.Sonstebo@shapeblue.com
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
>
>
>
>


-- 

Andrija Panić

Re: Help/Advice needed - some traffic don't reach VNET / VM

Posted by Andrija Panic <an...@gmail.com>.

Hi guys,

thanks for quick reply:

- VM issue happens on Windows mostly (one customer is of particularly bad
luck as it seems), but afaik also happens on Linux, and FortiOS (some FW
stuff, not pure linux) - both are running PV stuff (Windows PV, or CentOS
6.5 x64 OS type)
- we are actually using LACP on switches, and I also disabled/down one bond
interface on the host - although it makes zero sense, because packer
already arrived via this bond0, to te bond0.XXX vlan (VTEP), then packet
also arrived to child vxlan interface (vxlan on top of vlan on top of
bond...) and then packet also arrived to bridge, but was never passed to
VNET.

My expectation is that this is purely inside-the-host problem, since packet
arrives from outside physical network to the host's vxlan/bridge...but not
to VNET.
Seems like some qemu issue, but I found zero things using google, that
looks similar like our issue.

Have no idea...



On 9 October 2017 at 23:08, Dag Sonstebo <Da...@shapeblue.com> wrote:

> Hi Andrija,
>
> Do you use NIC bonds? I have seen this before when using active-active
> bonds, and as you say it can be very difficult to troubleshoot and the
> behaviour makes little sense. What can happen is network traffic is load
> balanced between the two NICs, however the update frequency of the MAC
> tables between the two switches don’t keep up with the load balanced
> traffic. In other words a MAC address which used to transmit on hypervisor
> eth0 (attached to your first top of rack switch) of a bond has suddenly due
> to load started transmitting on eth1 (attached to the second of the top of
> rack switches) of the bond, however the physical switch stack still thinks
> the MAC address lives on eth0, hence traffic is dropped until next time the
> switches synch MAC tables.
>
> We used to see this a lot in the past on XenServer – the solution being
> moving to active-passive bond modes, or go up to LACP/802.3ad if your
> hardware allows for it. The same principle will however also apply on
> generic linux bonds.
>
> Regards,
> Dag Sonstebo
> Cloud Architect
> ShapeBlue
>  S: +44 20 3603 0540  | dag.sonstebo@shapeblue.com |
> http://www.shapeblue.com <http://www.shapeblue.com/> | Twitter:@ShapeBlue
> <https://twitter.com/#!/shapeblue>
>
>
> On 09/10/2017, 21:52, "Andrija Panic" <an...@gmail.com> wrote:
>
>     Hi guys,
>
>     we have occasional but serious problem, that starts happening as it
> seems
>     randomly (i.e. NOT under high load)  - not ACS related afaik, purely
> KVM,
>     but feedback is really welcomed.
>
>     - VM is reachable in general from everywhere, but not reachable from
>     specific IP address ?!
>     - VM is NOT under high load, network traffic next to zero, same for
>     CPU/disk...
>     - We mitigate this problem by migrating VM away to another host, not
> much
>     of a solution...
>
>     Description of problem:
>
>     We let ping from "problematic" source IP address to the problematic
> VM, and
>     we capture traffic on KVM host where the problematic VM lives:
>
>     - Tcpdump on VXLAN interface (physical incoming interface on the host)
> - we
>     see packet fine
>     - tcpdump on BRIDGE = we see packet fine
>     - tcpdump on VNET = we DON'T see packet.
>
>     In the scenario above, I need to say that :
>     - we can tcpdump packets from other source IPs on the VNET interface
> just
>     fine (as expected), so should also see this problematic source IP's
> packets
>     - we can actually ping in oposite direction - from the problematic VM
> to
>     the problematic "source" IP
>
>     We checked everything possible, from bridge port forwarding, to
> mac-to-vtep
>     mapping, to many other things, removed traffic shaping from VNET
> interface,
>     no iptables/ebtables, no STP on bridge, remove and rejoin interfaces to
>     bridge, destroy bridge and create manually on the fly,
>
>     Problem is really crazy, and I can not explain it - no iptables, no
>     ebtables for troubleshooting pruposes (on this host) and
>
>     We mitigate this problem by migrating VM away to another host, not
> much of
>     a solution...
>
>     This is Ubuntu 14.04, Qemu 2.5 (libvirt 1.3.1),
>     Stock kernel 3.16-xx, regular bridge (not OVS)
>
>     Anyone else ever heard of such problem - this is not intermittent
> packet
>     dropping, but complete blackout/packet drop in some way...
>
>     Thanks,
>
>     --
>
>     Andrija Panić
>
>
>
> Dag.Sonstebo@shapeblue.com
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
>
>
>
>


-- 

Andrija Panić

Re: Help/Advice needed - some traffic don't reach VNET / VM

Posted by Dag Sonstebo <Da...@shapeblue.com>.

Hi Andrija,

Do you use NIC bonds? I have seen this before when using active-active bonds, and as you say it can be very difficult to troubleshoot and the behaviour makes little sense. What can happen is network traffic is load balanced between the two NICs, however the update frequency of the MAC tables between the two switches don’t keep up with the load balanced traffic. In other words a MAC address which used to transmit on hypervisor eth0 (attached to your first top of rack switch) of a bond has suddenly due to load started transmitting on eth1 (attached to the second of the top of rack switches) of the bond, however the physical switch stack still thinks the MAC address lives on eth0, hence traffic is dropped until next time the switches synch MAC tables. 

We used to see this a lot in the past on XenServer – the solution being moving to active-passive bond modes, or go up to LACP/802.3ad if your hardware allows for it. The same principle will however also apply on generic linux bonds.

Regards, 
Dag Sonstebo
Cloud Architect
ShapeBlue
 S: +44 20 3603 0540  | dag.sonstebo@shapeblue.com | http://www.shapeblue.com <http://www.shapeblue.com/> | Twitter:@ShapeBlue <https://twitter.com/#!/shapeblue>


On 09/10/2017, 21:52, "Andrija Panic" <an...@gmail.com> wrote:

    Hi guys,
    
    we have occasional but serious problem, that starts happening as it seems
    randomly (i.e. NOT under high load)  - not ACS related afaik, purely KVM,
    but feedback is really welcomed.
    
    - VM is reachable in general from everywhere, but not reachable from
    specific IP address ?!
    - VM is NOT under high load, network traffic next to zero, same for
    CPU/disk...
    - We mitigate this problem by migrating VM away to another host, not much
    of a solution...
    
    Description of problem:
    
    We let ping from "problematic" source IP address to the problematic VM, and
    we capture traffic on KVM host where the problematic VM lives:
    
    - Tcpdump on VXLAN interface (physical incoming interface on the host) - we
    see packet fine
    - tcpdump on BRIDGE = we see packet fine
    - tcpdump on VNET = we DON'T see packet.
    
    In the scenario above, I need to say that :
    - we can tcpdump packets from other source IPs on the VNET interface just
    fine (as expected), so should also see this problematic source IP's packets
    - we can actually ping in oposite direction - from the problematic VM to
    the problematic "source" IP
    
    We checked everything possible, from bridge port forwarding, to mac-to-vtep
    mapping, to many other things, removed traffic shaping from VNET interface,
    no iptables/ebtables, no STP on bridge, remove and rejoin interfaces to
    bridge, destroy bridge and create manually on the fly,
    
    Problem is really crazy, and I can not explain it - no iptables, no
    ebtables for troubleshooting pruposes (on this host) and
    
    We mitigate this problem by migrating VM away to another host, not much of
    a solution...
    
    This is Ubuntu 14.04, Qemu 2.5 (libvirt 1.3.1),
    Stock kernel 3.16-xx, regular bridge (not OVS)
    
    Anyone else ever heard of such problem - this is not intermittent packet
    dropping, but complete blackout/packet drop in some way...
    
    Thanks,
    
    -- 
    
    Andrija Panić
    


Dag.Sonstebo@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue

Re: Help/Advice needed - some traffic don't reach VNET / VM

Posted by Dag Sonstebo <Da...@shapeblue.com>.

Hi Andrija,

Do you use NIC bonds? I have seen this before when using active-active bonds, and as you say it can be very difficult to troubleshoot and the behaviour makes little sense. What can happen is network traffic is load balanced between the two NICs, however the update frequency of the MAC tables between the two switches don’t keep up with the load balanced traffic. In other words a MAC address which used to transmit on hypervisor eth0 (attached to your first top of rack switch) of a bond has suddenly due to load started transmitting on eth1 (attached to the second of the top of rack switches) of the bond, however the physical switch stack still thinks the MAC address lives on eth0, hence traffic is dropped until next time the switches synch MAC tables. 

We used to see this a lot in the past on XenServer – the solution being moving to active-passive bond modes, or go up to LACP/802.3ad if your hardware allows for it. The same principle will however also apply on generic linux bonds.

Regards, 
Dag Sonstebo
Cloud Architect
ShapeBlue
 S: +44 20 3603 0540  | dag.sonstebo@shapeblue.com | http://www.shapeblue.com <http://www.shapeblue.com/> | Twitter:@ShapeBlue <https://twitter.com/#!/shapeblue>


On 09/10/2017, 21:52, "Andrija Panic" <an...@gmail.com> wrote:

    Hi guys,
    
    we have occasional but serious problem, that starts happening as it seems
    randomly (i.e. NOT under high load)  - not ACS related afaik, purely KVM,
    but feedback is really welcomed.
    
    - VM is reachable in general from everywhere, but not reachable from
    specific IP address ?!
    - VM is NOT under high load, network traffic next to zero, same for
    CPU/disk...
    - We mitigate this problem by migrating VM away to another host, not much
    of a solution...
    
    Description of problem:
    
    We let ping from "problematic" source IP address to the problematic VM, and
    we capture traffic on KVM host where the problematic VM lives:
    
    - Tcpdump on VXLAN interface (physical incoming interface on the host) - we
    see packet fine
    - tcpdump on BRIDGE = we see packet fine
    - tcpdump on VNET = we DON'T see packet.
    
    In the scenario above, I need to say that :
    - we can tcpdump packets from other source IPs on the VNET interface just
    fine (as expected), so should also see this problematic source IP's packets
    - we can actually ping in oposite direction - from the problematic VM to
    the problematic "source" IP
    
    We checked everything possible, from bridge port forwarding, to mac-to-vtep
    mapping, to many other things, removed traffic shaping from VNET interface,
    no iptables/ebtables, no STP on bridge, remove and rejoin interfaces to
    bridge, destroy bridge and create manually on the fly,
    
    Problem is really crazy, and I can not explain it - no iptables, no
    ebtables for troubleshooting pruposes (on this host) and
    
    We mitigate this problem by migrating VM away to another host, not much of
    a solution...
    
    This is Ubuntu 14.04, Qemu 2.5 (libvirt 1.3.1),
    Stock kernel 3.16-xx, regular bridge (not OVS)
    
    Anyone else ever heard of such problem - this is not intermittent packet
    dropping, but complete blackout/packet drop in some way...
    
    Thanks,
    
    -- 
    
    Andrija Panić
    


Dag.Sonstebo@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue