You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cloudstack.apache.org by Andrei Mikhailovsky <an...@arhont.com.INVALID> on 2018/04/20 08:51:30 UTC

Upgrade from ACS 4.9.X to 4.11.0 broke VPC source NAT

Hello, 

I have been posting to the users thread about this issue. here is a quick summary in case if people contributing to the source nat code on the VPC side would like to fix this issue. 


Problem summary: no connectivity between virtual machines behind two Static NAT networks. 

Problem case: When one virtual machine sends a packet to the external address of the another virtual machine that are handled by the same router and both are behind the Static NAT the traffic does not work. 



10.1.10.100 10.1.10.1:eth2 eth3:10.1.20.1 10.1.20.100 
virt1 <---> router <---> virt2 
178.248.108.77:eth1:178.248.108.113 


a single packet is send from virt1 to virt2. 


stage1: it arrives to the router on eth2 and enters "nat_PREROUTING" 
IN=eth2 OUT= SRC=10.1.10.100 DST=178.248.108.113) 

goes through the "10 1K DNAT all -- * * 0.0.0.0/0 178.248.108.113 to:10.1.20.100 
" rule and has the DST DNATED to the internal IP of the virt2 


stage2: Enters the FORWARDING chain and is being DROPPED by the default policy. 
DROPPED:IN=eth2 OUT=eth1 SRC=10.1.10.100 DST=10.1.20.100 

The reason being is that the OUT interface is not correctly changed from eth1 to eth3 during the nat_PREROUTING 
so the packet is not intercepted by the FORWARD rule and thus not accepted. 
"24 14K ACL_INBOUND_eth3 all -- * eth3 0.0.0.0/0 10.1.20.0/24" 


stage3: manually inserted rule to accept this packet for FORWARDING. 
the packet enters the "nat_POSTROUTING" chain 
IN= OUT=eth1 SRC=10.1.10.100 DST=10.1.20.100 

and has the SRC changed to the external IP 
16 1320 SNAT all -- * eth1 10.1.10.100 0.0.0.0/0 to:178.248.108.77 

and is sent to the external network on eth1. 
13:37:44.834341 IP 178.248.108.77 > 10.1.20.100: ICMP echo request, id 2644, seq 2, length 64 


For some reason, during the nat_PREROUTING stage the DST_IP is changed, but the OUT interface still reflects the interface associated with the old DST_IP. 

Here is the routing table 
# ip route list 
default via 178.248.108.1 dev eth1 
10.1.10.0/24 dev eth2 proto kernel scope link src 10.1.10.1 
10.1.20.0/24 dev eth3 proto kernel scope link src 10.1.20.1 
169.254.0.0/16 dev eth0 proto kernel scope link src 169.254.0.5 
178.248.108.0/25 dev eth1 proto kernel scope link src 178.248.108.101 

# ip rule list 
0: from all lookup local 
32761: from all fwmark 0x3 lookup Table_eth3 
32762: from all fwmark 0x2 lookup Table_eth2 
32763: from all fwmark 0x1 lookup Table_eth1 
32764: from 10.1.0.0/16 lookup static_route_back 
32765: from 10.1.0.0/16 lookup static_route 
32766: from all lookup main 
32767: from all lookup default 


Further into the investigation, the problem was pinned down to those rules. 
All the traffic from internal IP on the static NATed connection were forced to go to the outside interface (eth1), by setting the mark 0x1 and then using the matching # ip rule to direct it. 

#iptables -t mangle -L PREROUTING -vn 
Chain PREROUTING (policy ACCEPT 97 packets, 11395 bytes) 
pkts bytes target prot opt in out source destination 
49 3644 CONNMARK all -- * * 10.1.10.100 0.0.0.0/0 state NEW CONNMARK save 
37 2720 MARK all -- * * 10.1.20.100 0.0.0.0/0 state NEW MARK set 0x1 
37 2720 CONNMARK all -- * * 10.1.20.100 0.0.0.0/0 state NEW CONNMARK save 
114 8472 MARK all -- * * 10.1.10.100 0.0.0.0/0 state NEW MARK set 0x1 
114 8472 CONNMARK all -- * * 10.1.10.100 0.0.0.0/0 state NEW CONNMARK save 


# ip rule 
0: from all lookup local 
32761: from all fwmark 0x3 lookup Table_eth3 
32762: from all fwmark 0x2 lookup Table_eth2 
32763: from all fwmark 0x1 lookup Table_eth1 
32764: from 10.1.0.0/16 lookup static_route_back 
32765: from 10.1.0.0/16 lookup static_route 
32766: from all lookup main 
32767: from all lookup default 


The acceptable solution is to delete those rules all together.? 

The problem with such approach is that the inter VPC traffic will use the internal IP addresses, 
so the packets going from 178.248.108.77 to 178.248.108.113 
would be seen as communication between 10.1.10.100 and 10.1.20.100 

thus we need to apply further two rules 
# iptables -t nat -I POSTROUTING -o eth3 -s 10.1.10.0/24 -d 10.1.20.0/24 -j SNAT --to-source 178.248.108.77 
# iptables -t nat -I POSTROUTING -o eth2 -s 10.1.20.0/24 -d 10.1.10.0/24 -j SNAT --to-source 178.248.108.113 

in order to make sure that the packets leaving the router would have correct source IP. 

This way it is possible to have static NAT on all of the IPS within the VPC and ensure a successful communication between them. 


So, for a quick and dirty fix, we ran this command on the VR: 

for i in iptables -t mangle -L PREROUTING -vn | awk '/0x1/ && !/eth1/ {print $8}'; do iptables -t mangle -D PREROUTING -s $i -m state —state NEW -j MARK —set-mark "0x1" ; done 



The issue has been introduced around early 4.9.x releases I believe. 


Thanks 

Andrei 





----- Original Message ----- 
> From: "Andrei Mikhailovsky" <an...@arhont.com.INVALID> 
> To: "users" <us...@cloudstack.apache.org> 
> Sent: Monday, 16 April, 2018 22:32:25 
> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0 

> Hello, 
> 
> I have done some more testing with the VPC network tiers and it seems that the 
> Static NAT is indeed causing connectivity issues. Here is what I've done: 
> 
> 
> Setup 1. I have created two test network tiers with one guest vm in each tier. 
> Static NAT is NOT enabled. Each VM has a port forwarding rule (port 22) from 
> its dedicated public IP address. ACLs have been setup to allow traffic on port 
> 22 from the private ip addresses on each network tier. 
> 
> 1. ACLs seems to work just fine. traffic between the networks flows according to 
> the rules. both vms can see each other's private IPs and can ping/ssh/etc 
> 
> 2. From the Internet hosts can access vms on port 22 
> 
> 4. The vms can also access each other and itself on their public IPs. I don't 
> think this worked before, but could be wrong. 
> 
> 
> 
> Setup 2. Everything the same as Setup 1, but one public IP address has been 
> setup as Static NAT to one guest vm. the second guest vm and second public IP 
> remained unchanged. 
> 
> 1. ACLs stopped working correctly (see below) 
> 
> 2. From the Internet hosts can access vms on port 22, including the Static NAT 
> vm 
> 
> 3. Other guest vms can access the Static NAT vm using private & public IP 
> addresses 
> 
> 4. Static NAT vm can NOT access other vms neither using public nor private IPs 
> 
> 5. Static NAT vm can access the internet hosts (apart from the public IP range 
> belonging to the cloudstack setup) 
> 
> 
> The above behaviour of Setup 2 scenarios is very strange, especially points 4 & 
> 5. 
> 
> Any thoughts anyone? 
> 
> Cheers 
> 
> ----- Original Message ----- 
>> From: "Rohit Yadav" <ro...@shapeblue.com> 
>> To: "users" <us...@cloudstack.apache.org> 
>> Sent: Thursday, 12 April, 2018 12:06:54 
>> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0 
> 
>> Hi Andrei, 
>> 
>> 
>> Thanks for sharing, yes the egress thing is a known issue which is caused due to 
>> failure during VR setup to create egress table. By performing a restart of the 
>> network (without cleanup option selected), the egress table gets created and 
>> rules are successfully applied. 
>> 
>> 
>> The issue has been fixed in the vr downtime pr: 
>> 
>> https://github.com/apache/cloudstack/pull/2508 
>> 
>> 
>> - Rohit 
>> 
>> <https://cloudstack.apache.org> 
>> 
>> 
>> 
>> ________________________________ 
>> From: Andrei Mikhailovsky <an...@arhont.com.INVALID> 
>> Sent: Tuesday, April 3, 2018 3:33:43 PM 
>> To: users 
>> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0 
>> 
>> Rohit, 
>> 
>> Following the update from 4.9.3 to 4.11.0, I would like to comment on a few 
>> things: 
>> 
>> 1. The upgrade went well, a part from the cloudstack-management server startup 
>> issue that I've described in my previous email. 
>> 2. there was an issue with the virtual router template upgrade. The issue is 
>> described below: 
>> 
>> VR template upgrade issue: 
>> 
>> After updating the systemvm template I went onto the Infrastructure > Virtual 
>> Routers and selected the Update template option for each virtual router. The 
>> virtual routers were updated successfully using the new templates. However, 
>> this has broken ALL Egress rules on all networks and none of the guest vms. 
>> Port forwarding / incoming rules were working just fine. Removal and addition 
>> of Egress rules did not fix the issue. To fix the issue I had to restart each 
>> of the networks with the Clean up option ticked. 
>> 
>> 
>> Cheers 
>> 
>> Andrei 
>> 
>> rohit.yadav@shapeblue.com 
>> www.shapeblue.com 
>> 53 Chandos Place, Covent Garden, London WC2N 4HSUK 
>> @shapeblue 
>> 
>> 
>> 
>> ----- Original Message ----- 
>>> From: "Andrei Mikhailovsky" <an...@arhont.com.INVALID> 
>>> To: "users" <us...@cloudstack.apache.org> 
>>> Sent: Monday, 2 April, 2018 21:44:27 
>>> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0 
>> 
>>> Hi Rohit, 
>>> 
>>> Following some further investigation it seems that the installation packages 
>>> replaced the following file: 
>>> 
>>> /etc/default/cloudstack-management 
>>> 
>>> with 
>>> 
>>> /etc/default/cloudstack-management.dpkg-dist 
>>> 
>>> 
>>> Thus, the management server couldn't load the env variables and thus was unable 
>>> to start. 
>>> 
>>> I've put the file back and the management server is able to start. 
>>> 
>>> I will let you know if there are any other issues/problems. 
>>> 
>>> Cheers 
>>> 
>>> Andrei 
>>> 
>>> 
>>> 
>>> ----- Original Message ----- 
>>>> From: "Andrei Mikhailovsky" <an...@arhont.com.INVALID> 
>>>> To: "users" <us...@cloudstack.apache.org> 
>>>> Sent: Monday, 2 April, 2018 20:58:59 
>>>> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0 
>>> 
>>>> Hi Rohit, 
>>>> 
>>>> I have just upgraded and having issues starting the service with the following 
>>>> error: 
>>>> 
>>>> 
>>>> Apr 02 20:56:37 ais-cloudhost13 systemd[1]: cloudstack-management.service: 
>>>> Failed to load environment files: No such file or directory 
>>>> Apr 02 20:56:37 ais-cloudhost13 systemd[1]: cloudstack-management.service: 
>>>> Failed to run 'start-pre' task: No such file or directory 
>>>> Apr 02 20:56:37 ais-cloudhost13 systemd[1]: Failed to start CloudStack 
>>>> Management Server. 
>>>> -- Subject: Unit cloudstack-management.service has failed 
>>>> -- Defined-By: systemd 
>>>> 
>>>> Cheers 
>>>> 
>>>> Andrei 
>>>> 
>>>> ----- Original Message ----- 
>>>>> From: "Rohit Yadav" <ro...@shapeblue.com> 
>>>>> To: "users" <us...@cloudstack.apache.org> 
>>>>> Sent: Friday, 30 March, 2018 19:17:48 
>>>>> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0 
>>>> 
>>>>> Some of the upgrade and minor issues have been fixed and will make their way 
>>>>> into 4.11.1.0. You're welcome to upgrade and share your feedback, but bear in 
>>>>> mind due to some changes a new/updated systemvmtemplate need to be issued for 
>>>>> 4.11.1.0 (it will be compatible for both 4.11.0.0 and 4.11.1.0 releases, but 
>>>>> 4.11.0.0 users will have to register that new template). 
>>>>> 
>>>>> 
>>>>> 
>>>>> - Rohit 
>>>>> 
>>>>> <https://cloudstack.apache.org> 
>>>>> 
>>>>> 
>>>>> 
>>>>> ________________________________ 
>>>>> From: Andrei Mikhailovsky <an...@arhont.com.INVALID> 
>>>>> Sent: Friday, March 30, 2018 11:00:34 PM 
>>>>> To: users 
>>>>> Subject: Upgrade from ACS 4.9.3 to 4.11.0 
>>>>> 
>>>>> Hello, 
>>>>> 
>>>>> My current infrastructure is ACS 4.9.3 with KVM based on Ubuntu 16.04 servers 
>>>>> for the KVM hosts and the management server. 
>>>>> 
>>>>> I am planning to perform an upgrade from ACS 4.9.3 to 4.11.0 and was wondering 
>>>>> if anyone had any issues during the upgrades? Anything to watch out for? 
>>>>> 
>>>>> I have previously seen issues with upgrading to 4.10, which required some manual 
>>>>> db updates from what I recall. Has this issue been fixed in the 4.11 upgrade 
>>>>> process? 
>>>>> 
>>>>> thanks 
>>>>> 
>>>>> Andrei 
>>>>> 
>>>>> rohit.yadav@shapeblue.com 
>>>>> www.shapeblue.com<http://www.shapeblue.com> 
>>>>> 53 Chandos Place, Covent Garden, London WC2N 4HSUK 
> > > > > @shapeblue 

Re: Upgrade from ACS 4.9.X to 4.11.0 broke VPC source NAT

Posted by Andrei Mikhailovsky <an...@arhont.com.INVALID>.
Hi Andrija,


From what I recall this was not an issue for us on 4.9.x. The problem started after we've upgraded. We do have a few networks that does require a static nat, so it is not really an option for us.

Its a shame that such an artefact hasn't been identified during the automated / manual testing prior to the release and the fix hasn't been included in the latest point release despite having fixes for over 100 issues, some of which are far less serious. Not too sure what to think of it to be honest. Seems like one step forward, two steps backwards with the new releases (

Andrei


----- Original Message -----
> From: "Andrija Panic" <an...@gmail.com>
> To: "dev" <de...@cloudstack.apache.org>
> Sent: Monday, 9 July, 2018 22:39:06
> Subject: Re: Upgrade from ACS 4.9.X to 4.11.0 broke VPC source NAT

> Andrei, if not mistaken I believe I saw same behavior even on 4.8 - in our
> case, what I vaguely remember was, that we configure Port Forwarding
> instead of Static NAT - it did solve our use case (for some customer), but
> maybe it's not acceptable for you...
> 
> Cheers
> 
> On Mon, 9 Jul 2018 at 18:27, Andrei Mikhailovsky <an...@arhont.com.invalid>
> wrote:
> 
>> Hi Rohit,
>>
>> I would like to send you a quick update on this issue. I have recently
>> upgraded to 4.11.1.0 with the new system vm templates. The issue that I've
>> described is still present in the latest release. Hasn't it been included
>> in the latest 4.11 maintenance release? I thought that it would be as it
>> breaks the major function of the VPC.
>>
>> Cheers.
>>
>> Andrei
>>
>> ----- Original Message -----
>> > From: "Andrei Mikhailovsky" <an...@arhont.com.INVALID>
>> > To: "dev" <de...@cloudstack.apache.org>
>> > Sent: Friday, 20 April, 2018 11:52:30
>> > Subject: Re: Upgrade from ACS 4.9.X to 4.11.0 broke VPC source NAT
>>
>> > Thanks
>> >
>> >
>> >
>> > ----- Original Message -----
>> >> From: "Rohit Yadav" <ro...@shapeblue.com>
>> >> To: "dev" <de...@cloudstack.apache.org>, "dev" <dev@cloudstack.apache.org
>> >
>> >> Sent: Friday, 20 April, 2018 10:35:55
>> >> Subject: Re: Upgrade from ACS 4.9.X to 4.11.0 broke VPC source NAT
>> >
>> >> Hi Andrei,
>> >>
>> >> I've fixed this recently, please see
>> >> https://github.com/apache/cloudstack/pull/2579
>> >>
>> >> As a workaround you can add routing rules manually. On the PR, there is
>> a link
>> >> to a comment that explains the issue and suggests manual workaround.
>> Let me
>> >> know if that works for you.
>> >>
>> >> Regards.
>> >>
>> >>
>> >> From: Andrei Mikhailovsky
>> >> Sent: Friday, 20 April, 2:21 PM
>> >> Subject: Upgrade from ACS 4.9.X to 4.11.0 broke VPC source NAT
>> >> To: dev
>> >>
>> >>
>> >> Hello, I have been posting to the users thread about this issue. here
>> is a quick
>> >> summary in case if people contributing to the source nat code on the
>> VPC side
>> >> would like to fix this issue. Problem summary: no connectivity between
>> virtual
>> >> machines behind two Static NAT networks. Problem case: When one virtual
>> machine
>> >> sends a packet to the external address of the another virtual machine
>> that are
>> >> handled by the same router and both are behind the Static NAT the
>> traffic does
>> >> not work. 10.1.10.100 10.1.10.1:eth2 eth3:10.1.20.1 10.1.20.100 virt1
>> router
>> >> virt2 178.248.108.77:eth1:178.248.108.113 a single packet is send from
>> virt1 to
>> >> virt2. stage1: it arrives to the router on eth2 and enters
>> "nat_PREROUTING"
>> >> IN=eth2 OUT= SRC=10.1.10.100 DST=178.248.108.113) goes through the "10
>> 1K DNAT
>> >> all -- * * 0.0.0.0/0 178.248.108.113 to:10.1.20.100 " rule and has the
>> DST
>> >> DNATED to the internal IP of the virt2 stage2: Enters the FORWARDING
>> chain and
>> >> is being DROPPED by the default policy. DROPPED:IN=eth2 OUT=eth1
>> >> SRC=10.1.10.100 DST=10.1.20.100 The reason being is that the OUT
>> interface is
>> >> not correctly changed from eth1 to eth3 during the nat_PREROUTING so
>> the packet
>> >> is not intercepted by the FORWARD rule and thus not accepted. "24 14K
>> >> ACL_INBOUND_eth3 all -- * eth3 0.0.0.0/0 10.1.20.0/24" stage3: manually
>> >> inserted rule to accept this packet for FORWARDING. the packet enters
>> the
>> >> "nat_POSTROUTING" chain IN= OUT=eth1 SRC=10.1.10.100 DST=10.1.20.100
>> and has
>> >> the SRC changed to the external IP 16 1320 SNAT all -- * eth1
>> 10.1.10.100
>> >> 0.0.0.0/0 to:178.248.108.77 and is sent to the external network on
>> eth1.
>> >> 13:37:44.834341 IP 178.248.108.77 > 10.1.20.100: ICMP echo request, id
>> 2644,
>> >> seq 2, length 64 For some reason, during the nat_PREROUTING stage the
>> DST_IP is
>> >> changed, but the OUT interface still reflects the interface associated
>> with the
>> >> old DST_IP. Here is the routing table # ip route list default via
>> 178.248.108.1
>> >> dev eth1 10.1.10.0/24 dev eth2 proto kernel scope link src 10.1.10.1
>> >> 10.1.20.0/24 dev eth3 proto kernel scope link src 10.1.20.1
>> 169.254.0.0/16 dev
>> >> eth0 proto kernel scope link src 169.254.0.5 178.248.108.0/25 dev eth1
>> proto
>> >> kernel scope link src 178.248.108.101 # ip rule list 0: from all lookup
>> local
>> >> 32761: from all fwmark 0x3 lookup Table_eth3 32762: from all fwmark 0x2
>> lookup
>> >> Table_eth2 32763: from all fwmark 0x1 lookup Table_eth1 32764: from
>> 10.1.0.0/16
>> >> lookup static_route_back 32765: from 10.1.0.0/16 lookup static_route
>> 32766:
>> >> from all lookup main 32767: from all lookup default Further into the
>> >> investigation, the problem was pinned down to those rules. All the
>> traffic from
>> >> internal IP on the static NATed connection were forced to go to the
>> outside
>> >> interface (eth1), by setting the mark 0x1 and then using the matching #
>> ip rule
>> >> to direct it. #iptables -t mangle -L PREROUTING -vn Chain PREROUTING
>> (policy
>> >> ACCEPT 97 packets, 11395 bytes) pkts bytes target prot opt in out source
>> >> destination 49 3644 CONNMARK all -- * * 10.1.10.100 0.0.0.0/0 state NEW
>> >> CONNMARK save 37 2720 MARK all -- * * 10.1.20.100 0.0.0.0/0 state NEW
>> MARK set
>> >> 0x1 37 2720 CONNMARK all -- * * 10.1.20.100 0.0.0.0/0 state NEW
>> CONNMARK save
>> >> 114 8472 MARK all -- * * 10.1.10.100 0.0.0.0/0 state NEW MARK set 0x1
>> 114 8472
>> >> CONNMARK all -- * * 10.1.10.100 0.0.0.0/0 state NEW CONNMARK save # ip
>> rule 0:
>> >> from all lookup local 32761: from all fwmark 0x3 lookup Table_eth3
>> 32762: from
>> >> all fwmark 0x2 lookup Table_eth2 32763: from all fwmark 0x1 lookup
>> Table_eth1
>> >> 32764: from 10.1.0.0/16 lookup static_route_back 32765: from
>> 10.1.0.0/16 lookup
>> >> static_route 32766: from all lookup main 32767: from all lookup default
>> The
>> >> acceptable solution is to delete those rules all together.? The problem
>> with
>> >> such approach is that the inter VPC traffic will use the internal IP
>> addresses,
>> >> so the packets going from 178.248.108.77 to 178.248.108.113 would be
>> seen as
>> >> communication between 10.1.10.100 and 10.1.20.100 thus we need to apply
>> further
>> >> two rules # iptables -t nat -I POSTROUTING -o eth3 -s 10.1.10.0/24 -d
>> >> 10.1.20.0/24 -j SNAT --to-source 178.248.108.77 # iptables -t nat -I
>> >> POSTROUTING -o eth2 -s 10.1.20.0/24 -d 10.1.10.0/24 -j SNAT --to-source
>> >> 178.248.108.113 in order to make sure that the packets leaving the
>> router would
>> >> have correct source IP. This way it is possible to have static NAT on
>> all of
>> >> the IPS within the VPC and ensure a successful communication between
>> them. So,
>> >> for a quick and dirty fix, we ran this command on the VR: for i in
>> iptables -t
>> >> mangle -L PREROUTING -vn | awk '/0x1/ && !/eth1/ {print $8}'; do
>> iptables -t
>> >> mangle -D PREROUTING -s $i -m state —state NEW -j MARK —set-mark "0x1"
>> ; done
>> >> The issue has been introduced around early 4.9.x releases I believe.
>> Thanks
>> >> Andrei
>> >> rohit.yadav@shapeblue.com
>> >> www.shapeblue.com
>> >> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
>> >> @shapeblue
>> >>
>> >>
>> >>
>> >> ----- Original Message ----- > From: "Andrei Mikhailovsky" > To:
>> "users" > Sent:
>> >> Monday, 16 April, 2018 22:32:25 > Subject: Re: Upgrade from ACS 4.9.3
>> to 4.11.0
>> >> > Hello, > > I have done some more testing with the VPC network tiers
>> and it
>> >> seems that the > Static NAT is indeed causing connectivity issues. Here
>> is what
>> >> I've done: > > > Setup 1. I have created two test network tiers with
>> one guest
>> >> vm in each tier. > Static NAT is NOT enabled. Each VM has a port
>> forwarding
>> >> rule (port 22) from > its dedicated public IP address. ACLs have been
>> setup to
>> >> allow traffic on port > 22 from the private ip addresses on each
>> network tier.
>> >> > > 1. ACLs seems to work just fine. traffic between the networks flows
>> >> according to > the rules. both vms can see each other's private IPs and
>> can
>> >> ping/ssh/etc > > 2. From the Internet hosts can access vms on port 22 >
>> > 4.
>> >> The vms can also access each other and itself on their public IPs. I
>> don't >
>> >> think this worked before, but could be wrong. > > > > Setup 2.
>> Everything the
>> >> same as Setup 1, but one public IP address has been > setup as Static
>> NAT to
>> >> one guest vm. the second guest vm and second public IP > remained
>> unchanged. >
>> >> > 1. ACLs stopped working correctly (see below) > > 2. From the
>> Internet hosts
>> >> can access vms on port 22, including the Static NAT > vm > > 3. Other
>> guest vms
>> >> can access the Static NAT vm using private & public IP > addresses > >
>> 4.
>> >> Static NAT vm can NOT access other vms neither using public nor private
>> IPs > >
>> >> 5. Static NAT vm can access the internet hosts (apart from the public
>> IP range
>> >> > belonging to the cloudstack setup) > > > The above behaviour of Setup
>> 2
>> >> scenarios is very strange, especially points 4 & > 5. > > Any thoughts
>> anyone?
>> >> > > Cheers > > ----- Original Message ----- >> From: "Rohit Yadav" >>
>> To:
>> >> "users" >> Sent: Thursday, 12 April, 2018 12:06:54 >> Subject: Re:
>> Upgrade from
>> >> ACS 4.9.3 to 4.11.0 > >> Hi Andrei, >> >> >> Thanks for sharing, yes
>> the egress
>> >> thing is a known issue which is caused due to >> failure during VR
>> setup to
>> >> create egress table. By performing a restart of the >> network (without
>> cleanup
>> >> option selected), the egress table gets created and >> rules are
>> successfully
>> >> applied. >> >> >> The issue has been fixed in the vr downtime pr: >> >>
>> >> https://github.com/apache/cloudstack/pull/2508 >> >> >> - Rohit >> >>
>> >> >> >>
>> >> >> ________________________________ >> From: Andrei Mikhailovsky >>
>> Sent:
>> >> Tuesday, April 3, 2018 3:33:43 PM >> To: users >> Subject: Re: Upgrade
>> from ACS
>> >> 4.9.3 to 4.11.0 >> >> Rohit, >> >> Following the update from 4.9.3 to
>> 4.11.0, I
>> >> would like to comment on a few >> things: >> >> 1. The upgrade went
>> well, a
>> >> part from the cloudstack-management server startup >> issue that I've
>> described
>> >> in my previous email. >> 2. there was an issue with the virtual router
>> template
>> >> upgrade. The issue is >> described below: >> >> VR template upgrade
>> issue: >>
>> >> >> After updating the systemvm template I went onto the Infrastructure >
>> >> Virtual >> Routers and selected the Update template option for each
>> virtual
>> >> router. The >> virtual routers were updated successfully using the new
>> >> templates. However, >> this has broken ALL Egress rules on all networks
>> and
>> >> none of the guest vms. >> Port forwarding / incoming rules were working
>> just
>> >> fine. Removal and addition >> of Egress rules did not fix the issue. To
>> fix the
>> >> issue I had to restart each >> of the networks with the Clean up option
>> ticked.
>> >> >> >> >> Cheers >> >> Andrei >> >> rohit.yadav@shapeblue.com >>
>> >> www.shapeblue.com >> 53 Chandos Place, Covent Garden, London WC2N
>> 4HSUK >>
>> >> @shapeblue >> >> >> >> ----- Original Message ----- >>> From: "Andrei
>> >> Mikhailovsky" >>> To: "users" >>> Sent: Monday, 2 April, 2018 21:44:27
>> >>>
>> >> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0 >> >>> Hi Rohit, >>> >>>
>> >> Following some further investigation it seems that the installation
>> packages
>> >> >>> replaced the following file: >>> >>>
>> /etc/default/cloudstack-management >>>
>> >> >>> with >>> >>> /etc/default/cloudstack-management.dpkg-dist >>> >>>
>> >>> Thus,
>> >> the management server couldn't load the env variables and thus was
>> unable >>>
>> >> to start. >>> >>> I've put the file back and the management server is
>> able to
>> >> start. >>> >>> I will let you know if there are any other
>> issues/problems. >>>
>> >> >>> Cheers >>> >>> Andrei >>> >>> >>> >>> ----- Original Message -----
>> >>>>
>> >> From: "Andrei Mikhailovsky" >>>> To: "users" >>>> Sent: Monday, 2
>> April, 2018
>> >> 20:58:59 >>>> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0 >>> >>>> Hi
>> Rohit,
>> >> >>>> >>>> I have just upgraded and having issues starting the service
>> with the
>> >> following >>>> error: >>>> >>>> >>>> Apr 02 20:56:37 ais-cloudhost13
>> >> systemd[1]: cloudstack-management.service: >>>> Failed to load
>> environment
>> >> files: No such file or directory >>>> Apr 02 20:56:37 ais-cloudhost13
>> >> systemd[1]: cloudstack-management.service: >>>> Failed to run
>> 'start-pre' task:
>> >> No such file or directory >>>> Apr 02 20:56:37 ais-cloudhost13
>> systemd[1]:
>> >> Failed to start CloudStack >>>> Management Server. >>>> -- Subject: Unit
>> >> cloudstack-management.service has failed >>>> -- Defined-By: systemd
>> >>>> >>>>
>> >> Cheers >>>> >>>> Andrei >>>> >>>> ----- Original Message ----- >>>>>
>> From:
>> >> "Rohit Yadav" >>>>> To: "users" >>>>> Sent: Friday, 30 March, 2018
>> 19:17:48
>> >> >>>>> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0 >>>> >>>>> Some of
>> the
>> >> upgrade and minor issues have been fixed and will make their way >>>>>
>> into
>> >> 4.11.1.0. You're welcome to upgrade and share your feedback, but bear
>> in >>>>>
>> >> mind due to some changes a new/updated systemvmtemplate need to be
>> issued for
>> >> >>>>> 4.11.1.0 (it will be compatible for both 4.11.0.0 and 4.11.1.0
>> releases,
>> >> but >>>>> 4.11.0.0 users will have to register that new template).
>> >>>>> >>>>>
>> >> >>>>> >>>>> - Rohit >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>
>> >> ________________________________ >>>>> From: Andrei Mikhailovsky >>>>>
>> Sent:
>> >> Friday, March 30, 2018 11:00:34 PM >>>>> To: users >>>>> Subject:
>> Upgrade from
>> >> ACS 4.9.3 to 4.11.0 >>>>> >>>>> Hello, >>>>> >>>>> My current
>> infrastructure is
>> >> ACS 4.9.3 with KVM based on Ubuntu 16.04 servers >>>>> for the KVM
>> hosts and
>> >> the management server. >>>>> >>>>> I am planning to perform an upgrade
>> from ACS
>> >> 4.9.3 to 4.11.0 and was wondering >>>>> if anyone had any issues during
>> the
>> >> upgrades? Anything to watch out for? >>>>> >>>>> I have previously seen
>> issues
>> >> with upgrading to 4.10, which required some manual >>>>> db updates
>> from what I
>> >> recall. Has this issue been fixed in the 4.11 upgrade >>>>> process?
>> >>>>>
>> >> >>>>> thanks >>>>> >>>>> Andrei >>>>> >>>>> rohit.yadav@shapeblue.com
>> >>>>>
>> >> www.shapeblue.com >>>>> 53 Chandos Place, Covent Garden, London WC2N
>> 4HSUK > >
>> > > > > > @shapeblue
>>
> 
> 
> --
> 
> Andrija Panić

Re: Upgrade from ACS 4.9.X to 4.11.0 broke VPC source NAT

Posted by Andrija Panic <an...@gmail.com>.
Andrei, if not mistaken I believe I saw same behavior even on 4.8 - in our
case, what I vaguely remember was, that we configure Port Forwarding
instead of Static NAT - it did solve our use case (for some customer), but
maybe it's not acceptable for you...

Cheers

On Mon, 9 Jul 2018 at 18:27, Andrei Mikhailovsky <an...@arhont.com.invalid>
wrote:

> Hi Rohit,
>
> I would like to send you a quick update on this issue. I have recently
> upgraded to 4.11.1.0 with the new system vm templates. The issue that I've
> described is still present in the latest release. Hasn't it been included
> in the latest 4.11 maintenance release? I thought that it would be as it
> breaks the major function of the VPC.
>
> Cheers.
>
> Andrei
>
> ----- Original Message -----
> > From: "Andrei Mikhailovsky" <an...@arhont.com.INVALID>
> > To: "dev" <de...@cloudstack.apache.org>
> > Sent: Friday, 20 April, 2018 11:52:30
> > Subject: Re: Upgrade from ACS 4.9.X to 4.11.0 broke VPC source NAT
>
> > Thanks
> >
> >
> >
> > ----- Original Message -----
> >> From: "Rohit Yadav" <ro...@shapeblue.com>
> >> To: "dev" <de...@cloudstack.apache.org>, "dev" <dev@cloudstack.apache.org
> >
> >> Sent: Friday, 20 April, 2018 10:35:55
> >> Subject: Re: Upgrade from ACS 4.9.X to 4.11.0 broke VPC source NAT
> >
> >> Hi Andrei,
> >>
> >> I've fixed this recently, please see
> >> https://github.com/apache/cloudstack/pull/2579
> >>
> >> As a workaround you can add routing rules manually. On the PR, there is
> a link
> >> to a comment that explains the issue and suggests manual workaround.
> Let me
> >> know if that works for you.
> >>
> >> Regards.
> >>
> >>
> >> From: Andrei Mikhailovsky
> >> Sent: Friday, 20 April, 2:21 PM
> >> Subject: Upgrade from ACS 4.9.X to 4.11.0 broke VPC source NAT
> >> To: dev
> >>
> >>
> >> Hello, I have been posting to the users thread about this issue. here
> is a quick
> >> summary in case if people contributing to the source nat code on the
> VPC side
> >> would like to fix this issue. Problem summary: no connectivity between
> virtual
> >> machines behind two Static NAT networks. Problem case: When one virtual
> machine
> >> sends a packet to the external address of the another virtual machine
> that are
> >> handled by the same router and both are behind the Static NAT the
> traffic does
> >> not work. 10.1.10.100 10.1.10.1:eth2 eth3:10.1.20.1 10.1.20.100 virt1
> router
> >> virt2 178.248.108.77:eth1:178.248.108.113 a single packet is send from
> virt1 to
> >> virt2. stage1: it arrives to the router on eth2 and enters
> "nat_PREROUTING"
> >> IN=eth2 OUT= SRC=10.1.10.100 DST=178.248.108.113) goes through the "10
> 1K DNAT
> >> all -- * * 0.0.0.0/0 178.248.108.113 to:10.1.20.100 " rule and has the
> DST
> >> DNATED to the internal IP of the virt2 stage2: Enters the FORWARDING
> chain and
> >> is being DROPPED by the default policy. DROPPED:IN=eth2 OUT=eth1
> >> SRC=10.1.10.100 DST=10.1.20.100 The reason being is that the OUT
> interface is
> >> not correctly changed from eth1 to eth3 during the nat_PREROUTING so
> the packet
> >> is not intercepted by the FORWARD rule and thus not accepted. "24 14K
> >> ACL_INBOUND_eth3 all -- * eth3 0.0.0.0/0 10.1.20.0/24" stage3: manually
> >> inserted rule to accept this packet for FORWARDING. the packet enters
> the
> >> "nat_POSTROUTING" chain IN= OUT=eth1 SRC=10.1.10.100 DST=10.1.20.100
> and has
> >> the SRC changed to the external IP 16 1320 SNAT all -- * eth1
> 10.1.10.100
> >> 0.0.0.0/0 to:178.248.108.77 and is sent to the external network on
> eth1.
> >> 13:37:44.834341 IP 178.248.108.77 > 10.1.20.100: ICMP echo request, id
> 2644,
> >> seq 2, length 64 For some reason, during the nat_PREROUTING stage the
> DST_IP is
> >> changed, but the OUT interface still reflects the interface associated
> with the
> >> old DST_IP. Here is the routing table # ip route list default via
> 178.248.108.1
> >> dev eth1 10.1.10.0/24 dev eth2 proto kernel scope link src 10.1.10.1
> >> 10.1.20.0/24 dev eth3 proto kernel scope link src 10.1.20.1
> 169.254.0.0/16 dev
> >> eth0 proto kernel scope link src 169.254.0.5 178.248.108.0/25 dev eth1
> proto
> >> kernel scope link src 178.248.108.101 # ip rule list 0: from all lookup
> local
> >> 32761: from all fwmark 0x3 lookup Table_eth3 32762: from all fwmark 0x2
> lookup
> >> Table_eth2 32763: from all fwmark 0x1 lookup Table_eth1 32764: from
> 10.1.0.0/16
> >> lookup static_route_back 32765: from 10.1.0.0/16 lookup static_route
> 32766:
> >> from all lookup main 32767: from all lookup default Further into the
> >> investigation, the problem was pinned down to those rules. All the
> traffic from
> >> internal IP on the static NATed connection were forced to go to the
> outside
> >> interface (eth1), by setting the mark 0x1 and then using the matching #
> ip rule
> >> to direct it. #iptables -t mangle -L PREROUTING -vn Chain PREROUTING
> (policy
> >> ACCEPT 97 packets, 11395 bytes) pkts bytes target prot opt in out source
> >> destination 49 3644 CONNMARK all -- * * 10.1.10.100 0.0.0.0/0 state NEW
> >> CONNMARK save 37 2720 MARK all -- * * 10.1.20.100 0.0.0.0/0 state NEW
> MARK set
> >> 0x1 37 2720 CONNMARK all -- * * 10.1.20.100 0.0.0.0/0 state NEW
> CONNMARK save
> >> 114 8472 MARK all -- * * 10.1.10.100 0.0.0.0/0 state NEW MARK set 0x1
> 114 8472
> >> CONNMARK all -- * * 10.1.10.100 0.0.0.0/0 state NEW CONNMARK save # ip
> rule 0:
> >> from all lookup local 32761: from all fwmark 0x3 lookup Table_eth3
> 32762: from
> >> all fwmark 0x2 lookup Table_eth2 32763: from all fwmark 0x1 lookup
> Table_eth1
> >> 32764: from 10.1.0.0/16 lookup static_route_back 32765: from
> 10.1.0.0/16 lookup
> >> static_route 32766: from all lookup main 32767: from all lookup default
> The
> >> acceptable solution is to delete those rules all together.? The problem
> with
> >> such approach is that the inter VPC traffic will use the internal IP
> addresses,
> >> so the packets going from 178.248.108.77 to 178.248.108.113 would be
> seen as
> >> communication between 10.1.10.100 and 10.1.20.100 thus we need to apply
> further
> >> two rules # iptables -t nat -I POSTROUTING -o eth3 -s 10.1.10.0/24 -d
> >> 10.1.20.0/24 -j SNAT --to-source 178.248.108.77 # iptables -t nat -I
> >> POSTROUTING -o eth2 -s 10.1.20.0/24 -d 10.1.10.0/24 -j SNAT --to-source
> >> 178.248.108.113 in order to make sure that the packets leaving the
> router would
> >> have correct source IP. This way it is possible to have static NAT on
> all of
> >> the IPS within the VPC and ensure a successful communication between
> them. So,
> >> for a quick and dirty fix, we ran this command on the VR: for i in
> iptables -t
> >> mangle -L PREROUTING -vn | awk '/0x1/ && !/eth1/ {print $8}'; do
> iptables -t
> >> mangle -D PREROUTING -s $i -m state —state NEW -j MARK —set-mark "0x1"
> ; done
> >> The issue has been introduced around early 4.9.x releases I believe.
> Thanks
> >> Andrei
> >> rohit.yadav@shapeblue.com
> >> www.shapeblue.com
> >> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> >> @shapeblue
> >>
> >>
> >>
> >> ----- Original Message ----- > From: "Andrei Mikhailovsky" > To:
> "users" > Sent:
> >> Monday, 16 April, 2018 22:32:25 > Subject: Re: Upgrade from ACS 4.9.3
> to 4.11.0
> >> > Hello, > > I have done some more testing with the VPC network tiers
> and it
> >> seems that the > Static NAT is indeed causing connectivity issues. Here
> is what
> >> I've done: > > > Setup 1. I have created two test network tiers with
> one guest
> >> vm in each tier. > Static NAT is NOT enabled. Each VM has a port
> forwarding
> >> rule (port 22) from > its dedicated public IP address. ACLs have been
> setup to
> >> allow traffic on port > 22 from the private ip addresses on each
> network tier.
> >> > > 1. ACLs seems to work just fine. traffic between the networks flows
> >> according to > the rules. both vms can see each other's private IPs and
> can
> >> ping/ssh/etc > > 2. From the Internet hosts can access vms on port 22 >
> > 4.
> >> The vms can also access each other and itself on their public IPs. I
> don't >
> >> think this worked before, but could be wrong. > > > > Setup 2.
> Everything the
> >> same as Setup 1, but one public IP address has been > setup as Static
> NAT to
> >> one guest vm. the second guest vm and second public IP > remained
> unchanged. >
> >> > 1. ACLs stopped working correctly (see below) > > 2. From the
> Internet hosts
> >> can access vms on port 22, including the Static NAT > vm > > 3. Other
> guest vms
> >> can access the Static NAT vm using private & public IP > addresses > >
> 4.
> >> Static NAT vm can NOT access other vms neither using public nor private
> IPs > >
> >> 5. Static NAT vm can access the internet hosts (apart from the public
> IP range
> >> > belonging to the cloudstack setup) > > > The above behaviour of Setup
> 2
> >> scenarios is very strange, especially points 4 & > 5. > > Any thoughts
> anyone?
> >> > > Cheers > > ----- Original Message ----- >> From: "Rohit Yadav" >>
> To:
> >> "users" >> Sent: Thursday, 12 April, 2018 12:06:54 >> Subject: Re:
> Upgrade from
> >> ACS 4.9.3 to 4.11.0 > >> Hi Andrei, >> >> >> Thanks for sharing, yes
> the egress
> >> thing is a known issue which is caused due to >> failure during VR
> setup to
> >> create egress table. By performing a restart of the >> network (without
> cleanup
> >> option selected), the egress table gets created and >> rules are
> successfully
> >> applied. >> >> >> The issue has been fixed in the vr downtime pr: >> >>
> >> https://github.com/apache/cloudstack/pull/2508 >> >> >> - Rohit >> >>
> >> >> >>
> >> >> ________________________________ >> From: Andrei Mikhailovsky >>
> Sent:
> >> Tuesday, April 3, 2018 3:33:43 PM >> To: users >> Subject: Re: Upgrade
> from ACS
> >> 4.9.3 to 4.11.0 >> >> Rohit, >> >> Following the update from 4.9.3 to
> 4.11.0, I
> >> would like to comment on a few >> things: >> >> 1. The upgrade went
> well, a
> >> part from the cloudstack-management server startup >> issue that I've
> described
> >> in my previous email. >> 2. there was an issue with the virtual router
> template
> >> upgrade. The issue is >> described below: >> >> VR template upgrade
> issue: >>
> >> >> After updating the systemvm template I went onto the Infrastructure >
> >> Virtual >> Routers and selected the Update template option for each
> virtual
> >> router. The >> virtual routers were updated successfully using the new
> >> templates. However, >> this has broken ALL Egress rules on all networks
> and
> >> none of the guest vms. >> Port forwarding / incoming rules were working
> just
> >> fine. Removal and addition >> of Egress rules did not fix the issue. To
> fix the
> >> issue I had to restart each >> of the networks with the Clean up option
> ticked.
> >> >> >> >> Cheers >> >> Andrei >> >> rohit.yadav@shapeblue.com >>
> >> www.shapeblue.com >> 53 Chandos Place, Covent Garden, London WC2N
> 4HSUK >>
> >> @shapeblue >> >> >> >> ----- Original Message ----- >>> From: "Andrei
> >> Mikhailovsky" >>> To: "users" >>> Sent: Monday, 2 April, 2018 21:44:27
> >>>
> >> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0 >> >>> Hi Rohit, >>> >>>
> >> Following some further investigation it seems that the installation
> packages
> >> >>> replaced the following file: >>> >>>
> /etc/default/cloudstack-management >>>
> >> >>> with >>> >>> /etc/default/cloudstack-management.dpkg-dist >>> >>>
> >>> Thus,
> >> the management server couldn't load the env variables and thus was
> unable >>>
> >> to start. >>> >>> I've put the file back and the management server is
> able to
> >> start. >>> >>> I will let you know if there are any other
> issues/problems. >>>
> >> >>> Cheers >>> >>> Andrei >>> >>> >>> >>> ----- Original Message -----
> >>>>
> >> From: "Andrei Mikhailovsky" >>>> To: "users" >>>> Sent: Monday, 2
> April, 2018
> >> 20:58:59 >>>> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0 >>> >>>> Hi
> Rohit,
> >> >>>> >>>> I have just upgraded and having issues starting the service
> with the
> >> following >>>> error: >>>> >>>> >>>> Apr 02 20:56:37 ais-cloudhost13
> >> systemd[1]: cloudstack-management.service: >>>> Failed to load
> environment
> >> files: No such file or directory >>>> Apr 02 20:56:37 ais-cloudhost13
> >> systemd[1]: cloudstack-management.service: >>>> Failed to run
> 'start-pre' task:
> >> No such file or directory >>>> Apr 02 20:56:37 ais-cloudhost13
> systemd[1]:
> >> Failed to start CloudStack >>>> Management Server. >>>> -- Subject: Unit
> >> cloudstack-management.service has failed >>>> -- Defined-By: systemd
> >>>> >>>>
> >> Cheers >>>> >>>> Andrei >>>> >>>> ----- Original Message ----- >>>>>
> From:
> >> "Rohit Yadav" >>>>> To: "users" >>>>> Sent: Friday, 30 March, 2018
> 19:17:48
> >> >>>>> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0 >>>> >>>>> Some of
> the
> >> upgrade and minor issues have been fixed and will make their way >>>>>
> into
> >> 4.11.1.0. You're welcome to upgrade and share your feedback, but bear
> in >>>>>
> >> mind due to some changes a new/updated systemvmtemplate need to be
> issued for
> >> >>>>> 4.11.1.0 (it will be compatible for both 4.11.0.0 and 4.11.1.0
> releases,
> >> but >>>>> 4.11.0.0 users will have to register that new template).
> >>>>> >>>>>
> >> >>>>> >>>>> - Rohit >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>
> >> ________________________________ >>>>> From: Andrei Mikhailovsky >>>>>
> Sent:
> >> Friday, March 30, 2018 11:00:34 PM >>>>> To: users >>>>> Subject:
> Upgrade from
> >> ACS 4.9.3 to 4.11.0 >>>>> >>>>> Hello, >>>>> >>>>> My current
> infrastructure is
> >> ACS 4.9.3 with KVM based on Ubuntu 16.04 servers >>>>> for the KVM
> hosts and
> >> the management server. >>>>> >>>>> I am planning to perform an upgrade
> from ACS
> >> 4.9.3 to 4.11.0 and was wondering >>>>> if anyone had any issues during
> the
> >> upgrades? Anything to watch out for? >>>>> >>>>> I have previously seen
> issues
> >> with upgrading to 4.10, which required some manual >>>>> db updates
> from what I
> >> recall. Has this issue been fixed in the 4.11 upgrade >>>>> process?
> >>>>>
> >> >>>>> thanks >>>>> >>>>> Andrei >>>>> >>>>> rohit.yadav@shapeblue.com
> >>>>>
> >> www.shapeblue.com >>>>> 53 Chandos Place, Covent Garden, London WC2N
> 4HSUK > >
> > > > > > @shapeblue
>


-- 

Andrija Panić

Re: Upgrade from ACS 4.9.X to 4.11.0 broke VPC source NAT

Posted by Andrei Mikhailovsky <an...@arhont.com.INVALID>.
Hi Rohit,

I would like to send you a quick update on this issue. I have recently upgraded to 4.11.1.0 with the new system vm templates. The issue that I've described is still present in the latest release. Hasn't it been included in the latest 4.11 maintenance release? I thought that it would be as it breaks the major function of the VPC.

Cheers.

Andrei

----- Original Message -----
> From: "Andrei Mikhailovsky" <an...@arhont.com.INVALID>
> To: "dev" <de...@cloudstack.apache.org>
> Sent: Friday, 20 April, 2018 11:52:30
> Subject: Re: Upgrade from ACS 4.9.X to 4.11.0 broke VPC source NAT

> Thanks
> 
> 
> 
> ----- Original Message -----
>> From: "Rohit Yadav" <ro...@shapeblue.com>
>> To: "dev" <de...@cloudstack.apache.org>, "dev" <de...@cloudstack.apache.org>
>> Sent: Friday, 20 April, 2018 10:35:55
>> Subject: Re: Upgrade from ACS 4.9.X to 4.11.0 broke VPC source NAT
> 
>> Hi Andrei,
>> 
>> I've fixed this recently, please see
>> https://github.com/apache/cloudstack/pull/2579
>> 
>> As a workaround you can add routing rules manually. On the PR, there is a link
>> to a comment that explains the issue and suggests manual workaround. Let me
>> know if that works for you.
>> 
>> Regards.
>> 
>> 
>> From: Andrei Mikhailovsky
>> Sent: Friday, 20 April, 2:21 PM
>> Subject: Upgrade from ACS 4.9.X to 4.11.0 broke VPC source NAT
>> To: dev
>> 
>> 
>> Hello, I have been posting to the users thread about this issue. here is a quick
>> summary in case if people contributing to the source nat code on the VPC side
>> would like to fix this issue. Problem summary: no connectivity between virtual
>> machines behind two Static NAT networks. Problem case: When one virtual machine
>> sends a packet to the external address of the another virtual machine that are
>> handled by the same router and both are behind the Static NAT the traffic does
>> not work. 10.1.10.100 10.1.10.1:eth2 eth3:10.1.20.1 10.1.20.100 virt1 router
>> virt2 178.248.108.77:eth1:178.248.108.113 a single packet is send from virt1 to
>> virt2. stage1: it arrives to the router on eth2 and enters "nat_PREROUTING"
>> IN=eth2 OUT= SRC=10.1.10.100 DST=178.248.108.113) goes through the "10 1K DNAT
>> all -- * * 0.0.0.0/0 178.248.108.113 to:10.1.20.100 " rule and has the DST
>> DNATED to the internal IP of the virt2 stage2: Enters the FORWARDING chain and
>> is being DROPPED by the default policy. DROPPED:IN=eth2 OUT=eth1
>> SRC=10.1.10.100 DST=10.1.20.100 The reason being is that the OUT interface is
>> not correctly changed from eth1 to eth3 during the nat_PREROUTING so the packet
>> is not intercepted by the FORWARD rule and thus not accepted. "24 14K
>> ACL_INBOUND_eth3 all -- * eth3 0.0.0.0/0 10.1.20.0/24" stage3: manually
>> inserted rule to accept this packet for FORWARDING. the packet enters the
>> "nat_POSTROUTING" chain IN= OUT=eth1 SRC=10.1.10.100 DST=10.1.20.100 and has
>> the SRC changed to the external IP 16 1320 SNAT all -- * eth1 10.1.10.100
>> 0.0.0.0/0 to:178.248.108.77 and is sent to the external network on eth1.
>> 13:37:44.834341 IP 178.248.108.77 > 10.1.20.100: ICMP echo request, id 2644,
>> seq 2, length 64 For some reason, during the nat_PREROUTING stage the DST_IP is
>> changed, but the OUT interface still reflects the interface associated with the
>> old DST_IP. Here is the routing table # ip route list default via 178.248.108.1
>> dev eth1 10.1.10.0/24 dev eth2 proto kernel scope link src 10.1.10.1
>> 10.1.20.0/24 dev eth3 proto kernel scope link src 10.1.20.1 169.254.0.0/16 dev
>> eth0 proto kernel scope link src 169.254.0.5 178.248.108.0/25 dev eth1 proto
>> kernel scope link src 178.248.108.101 # ip rule list 0: from all lookup local
>> 32761: from all fwmark 0x3 lookup Table_eth3 32762: from all fwmark 0x2 lookup
>> Table_eth2 32763: from all fwmark 0x1 lookup Table_eth1 32764: from 10.1.0.0/16
>> lookup static_route_back 32765: from 10.1.0.0/16 lookup static_route 32766:
>> from all lookup main 32767: from all lookup default Further into the
>> investigation, the problem was pinned down to those rules. All the traffic from
>> internal IP on the static NATed connection were forced to go to the outside
>> interface (eth1), by setting the mark 0x1 and then using the matching # ip rule
>> to direct it. #iptables -t mangle -L PREROUTING -vn Chain PREROUTING (policy
>> ACCEPT 97 packets, 11395 bytes) pkts bytes target prot opt in out source
>> destination 49 3644 CONNMARK all -- * * 10.1.10.100 0.0.0.0/0 state NEW
>> CONNMARK save 37 2720 MARK all -- * * 10.1.20.100 0.0.0.0/0 state NEW MARK set
>> 0x1 37 2720 CONNMARK all -- * * 10.1.20.100 0.0.0.0/0 state NEW CONNMARK save
>> 114 8472 MARK all -- * * 10.1.10.100 0.0.0.0/0 state NEW MARK set 0x1 114 8472
>> CONNMARK all -- * * 10.1.10.100 0.0.0.0/0 state NEW CONNMARK save # ip rule 0:
>> from all lookup local 32761: from all fwmark 0x3 lookup Table_eth3 32762: from
>> all fwmark 0x2 lookup Table_eth2 32763: from all fwmark 0x1 lookup Table_eth1
>> 32764: from 10.1.0.0/16 lookup static_route_back 32765: from 10.1.0.0/16 lookup
>> static_route 32766: from all lookup main 32767: from all lookup default The
>> acceptable solution is to delete those rules all together.? The problem with
>> such approach is that the inter VPC traffic will use the internal IP addresses,
>> so the packets going from 178.248.108.77 to 178.248.108.113 would be seen as
>> communication between 10.1.10.100 and 10.1.20.100 thus we need to apply further
>> two rules # iptables -t nat -I POSTROUTING -o eth3 -s 10.1.10.0/24 -d
>> 10.1.20.0/24 -j SNAT --to-source 178.248.108.77 # iptables -t nat -I
>> POSTROUTING -o eth2 -s 10.1.20.0/24 -d 10.1.10.0/24 -j SNAT --to-source
>> 178.248.108.113 in order to make sure that the packets leaving the router would
>> have correct source IP. This way it is possible to have static NAT on all of
>> the IPS within the VPC and ensure a successful communication between them. So,
>> for a quick and dirty fix, we ran this command on the VR: for i in iptables -t
>> mangle -L PREROUTING -vn | awk '/0x1/ && !/eth1/ {print $8}'; do iptables -t
>> mangle -D PREROUTING -s $i -m state —state NEW -j MARK —set-mark "0x1" ; done
>> The issue has been introduced around early 4.9.x releases I believe. Thanks
>> Andrei
>> rohit.yadav@shapeblue.com
>> www.shapeblue.com
>> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
>> @shapeblue
>>  
>> 
>> 
>> ----- Original Message ----- > From: "Andrei Mikhailovsky" > To: "users" > Sent:
>> Monday, 16 April, 2018 22:32:25 > Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0
>> > Hello, > > I have done some more testing with the VPC network tiers and it
>> seems that the > Static NAT is indeed causing connectivity issues. Here is what
>> I've done: > > > Setup 1. I have created two test network tiers with one guest
>> vm in each tier. > Static NAT is NOT enabled. Each VM has a port forwarding
>> rule (port 22) from > its dedicated public IP address. ACLs have been setup to
>> allow traffic on port > 22 from the private ip addresses on each network tier.
>> > > 1. ACLs seems to work just fine. traffic between the networks flows
>> according to > the rules. both vms can see each other's private IPs and can
>> ping/ssh/etc > > 2. From the Internet hosts can access vms on port 22 > > 4.
>> The vms can also access each other and itself on their public IPs. I don't >
>> think this worked before, but could be wrong. > > > > Setup 2. Everything the
>> same as Setup 1, but one public IP address has been > setup as Static NAT to
>> one guest vm. the second guest vm and second public IP > remained unchanged. >
>> > 1. ACLs stopped working correctly (see below) > > 2. From the Internet hosts
>> can access vms on port 22, including the Static NAT > vm > > 3. Other guest vms
>> can access the Static NAT vm using private & public IP > addresses > > 4.
>> Static NAT vm can NOT access other vms neither using public nor private IPs > >
>> 5. Static NAT vm can access the internet hosts (apart from the public IP range
>> > belonging to the cloudstack setup) > > > The above behaviour of Setup 2
>> scenarios is very strange, especially points 4 & > 5. > > Any thoughts anyone?
>> > > Cheers > > ----- Original Message ----- >> From: "Rohit Yadav" >> To:
>> "users" >> Sent: Thursday, 12 April, 2018 12:06:54 >> Subject: Re: Upgrade from
>> ACS 4.9.3 to 4.11.0 > >> Hi Andrei, >> >> >> Thanks for sharing, yes the egress
>> thing is a known issue which is caused due to >> failure during VR setup to
>> create egress table. By performing a restart of the >> network (without cleanup
>> option selected), the egress table gets created and >> rules are successfully
>> applied. >> >> >> The issue has been fixed in the vr downtime pr: >> >>
>> https://github.com/apache/cloudstack/pull/2508 >> >> >> - Rohit >> >> >> >> >>
>> >> ________________________________ >> From: Andrei Mikhailovsky >> Sent:
>> Tuesday, April 3, 2018 3:33:43 PM >> To: users >> Subject: Re: Upgrade from ACS
>> 4.9.3 to 4.11.0 >> >> Rohit, >> >> Following the update from 4.9.3 to 4.11.0, I
>> would like to comment on a few >> things: >> >> 1. The upgrade went well, a
>> part from the cloudstack-management server startup >> issue that I've described
>> in my previous email. >> 2. there was an issue with the virtual router template
>> upgrade. The issue is >> described below: >> >> VR template upgrade issue: >>
>> >> After updating the systemvm template I went onto the Infrastructure >
>> Virtual >> Routers and selected the Update template option for each virtual
>> router. The >> virtual routers were updated successfully using the new
>> templates. However, >> this has broken ALL Egress rules on all networks and
>> none of the guest vms. >> Port forwarding / incoming rules were working just
>> fine. Removal and addition >> of Egress rules did not fix the issue. To fix the
>> issue I had to restart each >> of the networks with the Clean up option ticked.
>> >> >> >> Cheers >> >> Andrei >> >> rohit.yadav@shapeblue.com >>
>> www.shapeblue.com >> 53 Chandos Place, Covent Garden, London WC2N 4HSUK >>
>> @shapeblue >> >> >> >> ----- Original Message ----- >>> From: "Andrei
>> Mikhailovsky" >>> To: "users" >>> Sent: Monday, 2 April, 2018 21:44:27 >>>
>> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0 >> >>> Hi Rohit, >>> >>>
>> Following some further investigation it seems that the installation packages
>> >>> replaced the following file: >>> >>> /etc/default/cloudstack-management >>>
>> >>> with >>> >>> /etc/default/cloudstack-management.dpkg-dist >>> >>> >>> Thus,
>> the management server couldn't load the env variables and thus was unable >>>
>> to start. >>> >>> I've put the file back and the management server is able to
>> start. >>> >>> I will let you know if there are any other issues/problems. >>>
>> >>> Cheers >>> >>> Andrei >>> >>> >>> >>> ----- Original Message ----- >>>>
>> From: "Andrei Mikhailovsky" >>>> To: "users" >>>> Sent: Monday, 2 April, 2018
>> 20:58:59 >>>> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0 >>> >>>> Hi Rohit,
>> >>>> >>>> I have just upgraded and having issues starting the service with the
>> following >>>> error: >>>> >>>> >>>> Apr 02 20:56:37 ais-cloudhost13
>> systemd[1]: cloudstack-management.service: >>>> Failed to load environment
>> files: No such file or directory >>>> Apr 02 20:56:37 ais-cloudhost13
>> systemd[1]: cloudstack-management.service: >>>> Failed to run 'start-pre' task:
>> No such file or directory >>>> Apr 02 20:56:37 ais-cloudhost13 systemd[1]:
>> Failed to start CloudStack >>>> Management Server. >>>> -- Subject: Unit
>> cloudstack-management.service has failed >>>> -- Defined-By: systemd >>>> >>>>
>> Cheers >>>> >>>> Andrei >>>> >>>> ----- Original Message ----- >>>>> From:
>> "Rohit Yadav" >>>>> To: "users" >>>>> Sent: Friday, 30 March, 2018 19:17:48
>> >>>>> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0 >>>> >>>>> Some of the
>> upgrade and minor issues have been fixed and will make their way >>>>> into
>> 4.11.1.0. You're welcome to upgrade and share your feedback, but bear in >>>>>
>> mind due to some changes a new/updated systemvmtemplate need to be issued for
>> >>>>> 4.11.1.0 (it will be compatible for both 4.11.0.0 and 4.11.1.0 releases,
>> but >>>>> 4.11.0.0 users will have to register that new template). >>>>> >>>>>
>> >>>>> >>>>> - Rohit >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>
>> ________________________________ >>>>> From: Andrei Mikhailovsky >>>>> Sent:
>> Friday, March 30, 2018 11:00:34 PM >>>>> To: users >>>>> Subject: Upgrade from
>> ACS 4.9.3 to 4.11.0 >>>>> >>>>> Hello, >>>>> >>>>> My current infrastructure is
>> ACS 4.9.3 with KVM based on Ubuntu 16.04 servers >>>>> for the KVM hosts and
>> the management server. >>>>> >>>>> I am planning to perform an upgrade from ACS
>> 4.9.3 to 4.11.0 and was wondering >>>>> if anyone had any issues during the
>> upgrades? Anything to watch out for? >>>>> >>>>> I have previously seen issues
>> with upgrading to 4.10, which required some manual >>>>> db updates from what I
>> recall. Has this issue been fixed in the 4.11 upgrade >>>>> process? >>>>>
>> >>>>> thanks >>>>> >>>>> Andrei >>>>> >>>>> rohit.yadav@shapeblue.com >>>>>
>> www.shapeblue.com >>>>> 53 Chandos Place, Covent Garden, London WC2N 4HSUK > >
> > > > > @shapeblue

Re: Upgrade from ACS 4.9.X to 4.11.0 broke VPC source NAT

Posted by Andrei Mikhailovsky <an...@arhont.com.INVALID>.
Thanks



----- Original Message -----
> From: "Rohit Yadav" <ro...@shapeblue.com>
> To: "dev" <de...@cloudstack.apache.org>, "dev" <de...@cloudstack.apache.org>
> Sent: Friday, 20 April, 2018 10:35:55
> Subject: Re: Upgrade from ACS 4.9.X to 4.11.0 broke VPC source NAT

> Hi Andrei,
> 
> I've fixed this recently, please see
> https://github.com/apache/cloudstack/pull/2579
> 
> As a workaround you can add routing rules manually. On the PR, there is a link
> to a comment that explains the issue and suggests manual workaround. Let me
> know if that works for you.
> 
> Regards.
> 
> 
> From: Andrei Mikhailovsky
> Sent: Friday, 20 April, 2:21 PM
> Subject: Upgrade from ACS 4.9.X to 4.11.0 broke VPC source NAT
> To: dev
> 
> 
> Hello, I have been posting to the users thread about this issue. here is a quick
> summary in case if people contributing to the source nat code on the VPC side
> would like to fix this issue. Problem summary: no connectivity between virtual
> machines behind two Static NAT networks. Problem case: When one virtual machine
> sends a packet to the external address of the another virtual machine that are
> handled by the same router and both are behind the Static NAT the traffic does
> not work. 10.1.10.100 10.1.10.1:eth2 eth3:10.1.20.1 10.1.20.100 virt1 router
> virt2 178.248.108.77:eth1:178.248.108.113 a single packet is send from virt1 to
> virt2. stage1: it arrives to the router on eth2 and enters "nat_PREROUTING"
> IN=eth2 OUT= SRC=10.1.10.100 DST=178.248.108.113) goes through the "10 1K DNAT
> all -- * * 0.0.0.0/0 178.248.108.113 to:10.1.20.100 " rule and has the DST
> DNATED to the internal IP of the virt2 stage2: Enters the FORWARDING chain and
> is being DROPPED by the default policy. DROPPED:IN=eth2 OUT=eth1
> SRC=10.1.10.100 DST=10.1.20.100 The reason being is that the OUT interface is
> not correctly changed from eth1 to eth3 during the nat_PREROUTING so the packet
> is not intercepted by the FORWARD rule and thus not accepted. "24 14K
> ACL_INBOUND_eth3 all -- * eth3 0.0.0.0/0 10.1.20.0/24" stage3: manually
> inserted rule to accept this packet for FORWARDING. the packet enters the
> "nat_POSTROUTING" chain IN= OUT=eth1 SRC=10.1.10.100 DST=10.1.20.100 and has
> the SRC changed to the external IP 16 1320 SNAT all -- * eth1 10.1.10.100
> 0.0.0.0/0 to:178.248.108.77 and is sent to the external network on eth1.
> 13:37:44.834341 IP 178.248.108.77 > 10.1.20.100: ICMP echo request, id 2644,
> seq 2, length 64 For some reason, during the nat_PREROUTING stage the DST_IP is
> changed, but the OUT interface still reflects the interface associated with the
> old DST_IP. Here is the routing table # ip route list default via 178.248.108.1
> dev eth1 10.1.10.0/24 dev eth2 proto kernel scope link src 10.1.10.1
> 10.1.20.0/24 dev eth3 proto kernel scope link src 10.1.20.1 169.254.0.0/16 dev
> eth0 proto kernel scope link src 169.254.0.5 178.248.108.0/25 dev eth1 proto
> kernel scope link src 178.248.108.101 # ip rule list 0: from all lookup local
> 32761: from all fwmark 0x3 lookup Table_eth3 32762: from all fwmark 0x2 lookup
> Table_eth2 32763: from all fwmark 0x1 lookup Table_eth1 32764: from 10.1.0.0/16
> lookup static_route_back 32765: from 10.1.0.0/16 lookup static_route 32766:
> from all lookup main 32767: from all lookup default Further into the
> investigation, the problem was pinned down to those rules. All the traffic from
> internal IP on the static NATed connection were forced to go to the outside
> interface (eth1), by setting the mark 0x1 and then using the matching # ip rule
> to direct it. #iptables -t mangle -L PREROUTING -vn Chain PREROUTING (policy
> ACCEPT 97 packets, 11395 bytes) pkts bytes target prot opt in out source
> destination 49 3644 CONNMARK all -- * * 10.1.10.100 0.0.0.0/0 state NEW
> CONNMARK save 37 2720 MARK all -- * * 10.1.20.100 0.0.0.0/0 state NEW MARK set
> 0x1 37 2720 CONNMARK all -- * * 10.1.20.100 0.0.0.0/0 state NEW CONNMARK save
> 114 8472 MARK all -- * * 10.1.10.100 0.0.0.0/0 state NEW MARK set 0x1 114 8472
> CONNMARK all -- * * 10.1.10.100 0.0.0.0/0 state NEW CONNMARK save # ip rule 0:
> from all lookup local 32761: from all fwmark 0x3 lookup Table_eth3 32762: from
> all fwmark 0x2 lookup Table_eth2 32763: from all fwmark 0x1 lookup Table_eth1
> 32764: from 10.1.0.0/16 lookup static_route_back 32765: from 10.1.0.0/16 lookup
> static_route 32766: from all lookup main 32767: from all lookup default The
> acceptable solution is to delete those rules all together.? The problem with
> such approach is that the inter VPC traffic will use the internal IP addresses,
> so the packets going from 178.248.108.77 to 178.248.108.113 would be seen as
> communication between 10.1.10.100 and 10.1.20.100 thus we need to apply further
> two rules # iptables -t nat -I POSTROUTING -o eth3 -s 10.1.10.0/24 -d
> 10.1.20.0/24 -j SNAT --to-source 178.248.108.77 # iptables -t nat -I
> POSTROUTING -o eth2 -s 10.1.20.0/24 -d 10.1.10.0/24 -j SNAT --to-source
> 178.248.108.113 in order to make sure that the packets leaving the router would
> have correct source IP. This way it is possible to have static NAT on all of
> the IPS within the VPC and ensure a successful communication between them. So,
> for a quick and dirty fix, we ran this command on the VR: for i in iptables -t
> mangle -L PREROUTING -vn | awk '/0x1/ && !/eth1/ {print $8}'; do iptables -t
> mangle -D PREROUTING -s $i -m state —state NEW -j MARK —set-mark "0x1" ; done
> The issue has been introduced around early 4.9.x releases I believe. Thanks
> Andrei
> rohit.yadav@shapeblue.com
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
>  
> 
> 
> ----- Original Message ----- > From: "Andrei Mikhailovsky" > To: "users" > Sent:
> Monday, 16 April, 2018 22:32:25 > Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0
> > Hello, > > I have done some more testing with the VPC network tiers and it
> seems that the > Static NAT is indeed causing connectivity issues. Here is what
> I've done: > > > Setup 1. I have created two test network tiers with one guest
> vm in each tier. > Static NAT is NOT enabled. Each VM has a port forwarding
> rule (port 22) from > its dedicated public IP address. ACLs have been setup to
> allow traffic on port > 22 from the private ip addresses on each network tier.
> > > 1. ACLs seems to work just fine. traffic between the networks flows
> according to > the rules. both vms can see each other's private IPs and can
> ping/ssh/etc > > 2. From the Internet hosts can access vms on port 22 > > 4.
> The vms can also access each other and itself on their public IPs. I don't >
> think this worked before, but could be wrong. > > > > Setup 2. Everything the
> same as Setup 1, but one public IP address has been > setup as Static NAT to
> one guest vm. the second guest vm and second public IP > remained unchanged. >
> > 1. ACLs stopped working correctly (see below) > > 2. From the Internet hosts
> can access vms on port 22, including the Static NAT > vm > > 3. Other guest vms
> can access the Static NAT vm using private & public IP > addresses > > 4.
> Static NAT vm can NOT access other vms neither using public nor private IPs > >
> 5. Static NAT vm can access the internet hosts (apart from the public IP range
> > belonging to the cloudstack setup) > > > The above behaviour of Setup 2
> scenarios is very strange, especially points 4 & > 5. > > Any thoughts anyone?
> > > Cheers > > ----- Original Message ----- >> From: "Rohit Yadav" >> To:
> "users" >> Sent: Thursday, 12 April, 2018 12:06:54 >> Subject: Re: Upgrade from
> ACS 4.9.3 to 4.11.0 > >> Hi Andrei, >> >> >> Thanks for sharing, yes the egress
> thing is a known issue which is caused due to >> failure during VR setup to
> create egress table. By performing a restart of the >> network (without cleanup
> option selected), the egress table gets created and >> rules are successfully
> applied. >> >> >> The issue has been fixed in the vr downtime pr: >> >>
> https://github.com/apache/cloudstack/pull/2508 >> >> >> - Rohit >> >> >> >> >>
> >> ________________________________ >> From: Andrei Mikhailovsky >> Sent:
> Tuesday, April 3, 2018 3:33:43 PM >> To: users >> Subject: Re: Upgrade from ACS
> 4.9.3 to 4.11.0 >> >> Rohit, >> >> Following the update from 4.9.3 to 4.11.0, I
> would like to comment on a few >> things: >> >> 1. The upgrade went well, a
> part from the cloudstack-management server startup >> issue that I've described
> in my previous email. >> 2. there was an issue with the virtual router template
> upgrade. The issue is >> described below: >> >> VR template upgrade issue: >>
> >> After updating the systemvm template I went onto the Infrastructure >
> Virtual >> Routers and selected the Update template option for each virtual
> router. The >> virtual routers were updated successfully using the new
> templates. However, >> this has broken ALL Egress rules on all networks and
> none of the guest vms. >> Port forwarding / incoming rules were working just
> fine. Removal and addition >> of Egress rules did not fix the issue. To fix the
> issue I had to restart each >> of the networks with the Clean up option ticked.
> >> >> >> Cheers >> >> Andrei >> >> rohit.yadav@shapeblue.com >>
> www.shapeblue.com >> 53 Chandos Place, Covent Garden, London WC2N 4HSUK >>
> @shapeblue >> >> >> >> ----- Original Message ----- >>> From: "Andrei
> Mikhailovsky" >>> To: "users" >>> Sent: Monday, 2 April, 2018 21:44:27 >>>
> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0 >> >>> Hi Rohit, >>> >>>
> Following some further investigation it seems that the installation packages
> >>> replaced the following file: >>> >>> /etc/default/cloudstack-management >>>
> >>> with >>> >>> /etc/default/cloudstack-management.dpkg-dist >>> >>> >>> Thus,
> the management server couldn't load the env variables and thus was unable >>>
> to start. >>> >>> I've put the file back and the management server is able to
> start. >>> >>> I will let you know if there are any other issues/problems. >>>
> >>> Cheers >>> >>> Andrei >>> >>> >>> >>> ----- Original Message ----- >>>>
> From: "Andrei Mikhailovsky" >>>> To: "users" >>>> Sent: Monday, 2 April, 2018
> 20:58:59 >>>> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0 >>> >>>> Hi Rohit,
> >>>> >>>> I have just upgraded and having issues starting the service with the
> following >>>> error: >>>> >>>> >>>> Apr 02 20:56:37 ais-cloudhost13
> systemd[1]: cloudstack-management.service: >>>> Failed to load environment
> files: No such file or directory >>>> Apr 02 20:56:37 ais-cloudhost13
> systemd[1]: cloudstack-management.service: >>>> Failed to run 'start-pre' task:
> No such file or directory >>>> Apr 02 20:56:37 ais-cloudhost13 systemd[1]:
> Failed to start CloudStack >>>> Management Server. >>>> -- Subject: Unit
> cloudstack-management.service has failed >>>> -- Defined-By: systemd >>>> >>>>
> Cheers >>>> >>>> Andrei >>>> >>>> ----- Original Message ----- >>>>> From:
> "Rohit Yadav" >>>>> To: "users" >>>>> Sent: Friday, 30 March, 2018 19:17:48
> >>>>> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0 >>>> >>>>> Some of the
> upgrade and minor issues have been fixed and will make their way >>>>> into
> 4.11.1.0. You're welcome to upgrade and share your feedback, but bear in >>>>>
> mind due to some changes a new/updated systemvmtemplate need to be issued for
> >>>>> 4.11.1.0 (it will be compatible for both 4.11.0.0 and 4.11.1.0 releases,
> but >>>>> 4.11.0.0 users will have to register that new template). >>>>> >>>>>
> >>>>> >>>>> - Rohit >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>
> ________________________________ >>>>> From: Andrei Mikhailovsky >>>>> Sent:
> Friday, March 30, 2018 11:00:34 PM >>>>> To: users >>>>> Subject: Upgrade from
> ACS 4.9.3 to 4.11.0 >>>>> >>>>> Hello, >>>>> >>>>> My current infrastructure is
> ACS 4.9.3 with KVM based on Ubuntu 16.04 servers >>>>> for the KVM hosts and
> the management server. >>>>> >>>>> I am planning to perform an upgrade from ACS
> 4.9.3 to 4.11.0 and was wondering >>>>> if anyone had any issues during the
> upgrades? Anything to watch out for? >>>>> >>>>> I have previously seen issues
> with upgrading to 4.10, which required some manual >>>>> db updates from what I
> recall. Has this issue been fixed in the 4.11 upgrade >>>>> process? >>>>>
> >>>>> thanks >>>>> >>>>> Andrei >>>>> >>>>> rohit.yadav@shapeblue.com >>>>>
> www.shapeblue.com >>>>> 53 Chandos Place, Covent Garden, London WC2N 4HSUK > >
> > > > @shapeblue

Re: Upgrade from ACS 4.9.X to 4.11.0 broke VPC source NAT

Posted by Rohit Yadav <ro...@shapeblue.com>.
Hi Andrei,

I've fixed this recently, please see
https://github.com/apache/cloudstack/pull/2579

As a workaround you can add routing rules manually. On the PR, there is a link to a comment that explains the issue and suggests manual workaround. Let me know if that works for you.

Regards.


From: Andrei Mikhailovsky
Sent: Friday, 20 April, 2:21 PM
Subject: Upgrade from ACS 4.9.X to 4.11.0 broke VPC source NAT
To: dev


Hello, I have been posting to the users thread about this issue. here is a quick summary in case if people contributing to the source nat code on the VPC side would like to fix this issue. Problem summary: no connectivity between virtual machines behind two Static NAT networks. Problem case: When one virtual machine sends a packet to the external address of the another virtual machine that are handled by the same router and both are behind the Static NAT the traffic does not work. 10.1.10.100 10.1.10.1:eth2 eth3:10.1.20.1 10.1.20.100 virt1 router virt2 178.248.108.77:eth1:178.248.108.113 a single packet is send from virt1 to virt2. stage1: it arrives to the router on eth2 and enters "nat_PREROUTING" IN=eth2 OUT= SRC=10.1.10.100 DST=178.248.108.113) goes through the "10 1K DNAT all -- * * 0.0.0.0/0 178.248.108.113 to:10.1.20.100 " rule and has the DST DNATED to the internal IP of the virt2 stage2: Enters the FORWARDING chain and is being DROPPED by the default policy. DROPPED:IN=eth2 OUT=eth1 SRC=10.1.10.100 DST=10.1.20.100 The reason being is that the OUT interface is not correctly changed from eth1 to eth3 during the nat_PREROUTING so the packet is not intercepted by the FORWARD rule and thus not accepted. "24 14K ACL_INBOUND_eth3 all -- * eth3 0.0.0.0/0 10.1.20.0/24" stage3: manually inserted rule to accept this packet for FORWARDING. the packet enters the "nat_POSTROUTING" chain IN= OUT=eth1 SRC=10.1.10.100 DST=10.1.20.100 and has the SRC changed to the external IP 16 1320 SNAT all -- * eth1 10.1.10.100 0.0.0.0/0 to:178.248.108.77 and is sent to the external network on eth1. 13:37:44.834341 IP 178.248.108.77 > 10.1.20.100: ICMP echo request, id 2644, seq 2, length 64 For some reason, during the nat_PREROUTING stage the DST_IP is changed, but the OUT interface still reflects the interface associated with the old DST_IP. Here is the routing table # ip route list default via 178.248.108.1 dev eth1 10.1.10.0/24 dev eth2 proto kernel scope link src 10.1.10.1 10.1.20.0/24 dev eth3 proto kernel scope link src 10.1.20.1 169.254.0.0/16 dev eth0 proto kernel scope link src 169.254.0.5 178.248.108.0/25 dev eth1 proto kernel scope link src 178.248.108.101 # ip rule list 0: from all lookup local 32761: from all fwmark 0x3 lookup Table_eth3 32762: from all fwmark 0x2 lookup Table_eth2 32763: from all fwmark 0x1 lookup Table_eth1 32764: from 10.1.0.0/16 lookup static_route_back 32765: from 10.1.0.0/16 lookup static_route 32766: from all lookup main 32767: from all lookup default Further into the investigation, the problem was pinned down to those rules. All the traffic from internal IP on the static NATed connection were forced to go to the outside interface (eth1), by setting the mark 0x1 and then using the matching # ip rule to direct it. #iptables -t mangle -L PREROUTING -vn Chain PREROUTING (policy ACCEPT 97 packets, 11395 bytes) pkts bytes target prot opt in out source destination 49 3644 CONNMARK all -- * * 10.1.10.100 0.0.0.0/0 state NEW CONNMARK save 37 2720 MARK all -- * * 10.1.20.100 0.0.0.0/0 state NEW MARK set 0x1 37 2720 CONNMARK all -- * * 10.1.20.100 0.0.0.0/0 state NEW CONNMARK save 114 8472 MARK all -- * * 10.1.10.100 0.0.0.0/0 state NEW MARK set 0x1 114 8472 CONNMARK all -- * * 10.1.10.100 0.0.0.0/0 state NEW CONNMARK save # ip rule 0: from all lookup local 32761: from all fwmark 0x3 lookup Table_eth3 32762: from all fwmark 0x2 lookup Table_eth2 32763: from all fwmark 0x1 lookup Table_eth1 32764: from 10.1.0.0/16 lookup static_route_back 32765: from 10.1.0.0/16 lookup static_route 32766: from all lookup main 32767: from all lookup default The acceptable solution is to delete those rules all together.? The problem with such approach is that the inter VPC traffic will use the internal IP addresses, so the packets going from 178.248.108.77 to 178.248.108.113 would be seen as communication between 10.1.10.100 and 10.1.20.100 thus we need to apply further two rules # iptables -t nat -I POSTROUTING -o eth3 -s 10.1.10.0/24 -d 10.1.20.0/24 -j SNAT --to-source 178.248.108.77 # iptables -t nat -I POSTROUTING -o eth2 -s 10.1.20.0/24 -d 10.1.10.0/24 -j SNAT --to-source 178.248.108.113 in order to make sure that the packets leaving the router would have correct source IP. This way it is possible to have static NAT on all of the IPS within the VPC and ensure a successful communication between them. So, for a quick and dirty fix, we ran this command on the VR: for i in iptables -t mangle -L PREROUTING -vn | awk '/0x1/ && !/eth1/ {print $8}'; do iptables -t mangle -D PREROUTING -s $i -m state —state NEW -j MARK —set-mark "0x1" ; done The issue has been introduced around early 4.9.x releases I believe. Thanks Andrei 
rohit.yadav@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 

----- Original Message ----- > From: "Andrei Mikhailovsky" > To: "users" > Sent: Monday, 16 April, 2018 22:32:25 > Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0 > Hello, > > I have done some more testing with the VPC network tiers and it seems that the > Static NAT is indeed causing connectivity issues. Here is what I've done: > > > Setup 1. I have created two test network tiers with one guest vm in each tier. > Static NAT is NOT enabled. Each VM has a port forwarding rule (port 22) from > its dedicated public IP address. ACLs have been setup to allow traffic on port > 22 from the private ip addresses on each network tier. > > 1. ACLs seems to work just fine. traffic between the networks flows according to > the rules. both vms can see each other's private IPs and can ping/ssh/etc > > 2. From the Internet hosts can access vms on port 22 > > 4. The vms can also access each other and itself on their public IPs. I don't > think this worked before, but could be wrong. > > > > Setup 2. Everything the same as Setup 1, but one public IP address has been > setup as Static NAT to one guest vm. the second guest vm and second public IP > remained unchanged. > > 1. ACLs stopped working correctly (see below) > > 2. From the Internet hosts can access vms on port 22, including the Static NAT > vm > > 3. Other guest vms can access the Static NAT vm using private & public IP > addresses > > 4. Static NAT vm can NOT access other vms neither using public nor private IPs > > 5. Static NAT vm can access the internet hosts (apart from the public IP range > belonging to the cloudstack setup) > > > The above behaviour of Setup 2 scenarios is very strange, especially points 4 & > 5. > > Any thoughts anyone? > > Cheers > > ----- Original Message ----- >> From: "Rohit Yadav" >> To: "users" >> Sent: Thursday, 12 April, 2018 12:06:54 >> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0 > >> Hi Andrei, >> >> >> Thanks for sharing, yes the egress thing is a known issue which is caused due to >> failure during VR setup to create egress table. By performing a restart of the >> network (without cleanup option selected), the egress table gets created and >> rules are successfully applied. >> >> >> The issue has been fixed in the vr downtime pr: >> >> https://github.com/apache/cloudstack/pull/2508 >> >> >> - Rohit >> >> >> >> >> >> ________________________________ >> From: Andrei Mikhailovsky >> Sent: Tuesday, April 3, 2018 3:33:43 PM >> To: users >> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0 >> >> Rohit, >> >> Following the update from 4.9.3 to 4.11.0, I would like to comment on a few >> things: >> >> 1. The upgrade went well, a part from the cloudstack-management server startup >> issue that I've described in my previous email. >> 2. there was an issue with the virtual router template upgrade. The issue is >> described below: >> >> VR template upgrade issue: >> >> After updating the systemvm template I went onto the Infrastructure > Virtual >> Routers and selected the Update template option for each virtual router. The >> virtual routers were updated successfully using the new templates. However, >> this has broken ALL Egress rules on all networks and none of the guest vms. >> Port forwarding / incoming rules were working just fine. Removal and addition >> of Egress rules did not fix the issue. To fix the issue I had to restart each >> of the networks with the Clean up option ticked. >> >> >> Cheers >> >> Andrei >> >> rohit.yadav@shapeblue.com >> www.shapeblue.com >> 53 Chandos Place, Covent Garden, London WC2N 4HSUK >> @shapeblue >> >> >> >> ----- Original Message ----- >>> From: "Andrei Mikhailovsky" >>> To: "users" >>> Sent: Monday, 2 April, 2018 21:44:27 >>> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0 >> >>> Hi Rohit, >>> >>> Following some further investigation it seems that the installation packages >>> replaced the following file: >>> >>> /etc/default/cloudstack-management >>> >>> with >>> >>> /etc/default/cloudstack-management.dpkg-dist >>> >>> >>> Thus, the management server couldn't load the env variables and thus was unable >>> to start. >>> >>> I've put the file back and the management server is able to start. >>> >>> I will let you know if there are any other issues/problems. >>> >>> Cheers >>> >>> Andrei >>> >>> >>> >>> ----- Original Message ----- >>>> From: "Andrei Mikhailovsky" >>>> To: "users" >>>> Sent: Monday, 2 April, 2018 20:58:59 >>>> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0 >>> >>>> Hi Rohit, >>>> >>>> I have just upgraded and having issues starting the service with the following >>>> error: >>>> >>>> >>>> Apr 02 20:56:37 ais-cloudhost13 systemd[1]: cloudstack-management.service: >>>> Failed to load environment files: No such file or directory >>>> Apr 02 20:56:37 ais-cloudhost13 systemd[1]: cloudstack-management.service: >>>> Failed to run 'start-pre' task: No such file or directory >>>> Apr 02 20:56:37 ais-cloudhost13 systemd[1]: Failed to start CloudStack >>>> Management Server. >>>> -- Subject: Unit cloudstack-management.service has failed >>>> -- Defined-By: systemd >>>> >>>> Cheers >>>> >>>> Andrei >>>> >>>> ----- Original Message ----- >>>>> From: "Rohit Yadav" >>>>> To: "users" >>>>> Sent: Friday, 30 March, 2018 19:17:48 >>>>> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0 >>>> >>>>> Some of the upgrade and minor issues have been fixed and will make their way >>>>> into 4.11.1.0. You're welcome to upgrade and share your feedback, but bear in >>>>> mind due to some changes a new/updated systemvmtemplate need to be issued for >>>>> 4.11.1.0 (it will be compatible for both 4.11.0.0 and 4.11.1.0 releases, but >>>>> 4.11.0.0 users will have to register that new template). >>>>> >>>>> >>>>> >>>>> - Rohit >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ________________________________ >>>>> From: Andrei Mikhailovsky >>>>> Sent: Friday, March 30, 2018 11:00:34 PM >>>>> To: users >>>>> Subject: Upgrade from ACS 4.9.3 to 4.11.0 >>>>> >>>>> Hello, >>>>> >>>>> My current infrastructure is ACS 4.9.3 with KVM based on Ubuntu 16.04 servers >>>>> for the KVM hosts and the management server. >>>>> >>>>> I am planning to perform an upgrade from ACS 4.9.3 to 4.11.0 and was wondering >>>>> if anyone had any issues during the upgrades? Anything to watch out for? >>>>> >>>>> I have previously seen issues with upgrading to 4.10, which required some manual >>>>> db updates from what I recall. Has this issue been fixed in the 4.11 upgrade >>>>> process? >>>>> >>>>> thanks >>>>> >>>>> Andrei >>>>> >>>>> rohit.yadav@shapeblue.com >>>>> www.shapeblue.com >>>>> 53 Chandos Place, Covent Garden, London WC2N 4HSUK > > > > > @shapeblue