You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@cloudstack.apache.org by "Justyn Shull (JIRA)" <ji...@apache.org> on 2014/09/04 19:49:51 UTC

[jira] [Created] (CLOUDSTACK-7489) Unable to expunge VM due to failing to revoke all static nat rules

Justyn Shull created CLOUDSTACK-7489:
----------------------------------------

             Summary: Unable to expunge VM due to failing to revoke all static nat rules
                 Key: CLOUDSTACK-7489
                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-7489
             Project: CloudStack
          Issue Type: Bug
      Security Level: Public (Anyone can view this level - this is the default.)
    Affects Versions: 4.4.0
         Environment: Kernel: 2.6.32-358.14.1.el6.x86_64 #1 SMP Tue Jul 16 23:51:20 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
OS: CentOS release 6.4 (Final)
CS RPMs:
cloudstack-management-4.4.0-NONOSS_3.el6.x86_64
cloudstack-common-4.4.0-NONOSS_3.el6.x86_64
cloudstack-cli-4.4.0-NONOSS_3.el6.x86_64
cloudstack-usage-4.4.0-NONOSS_3.el6.x86_64
cloudstack-awsapi-4.4.0-NONOSS_3.el6.x86_64

Cloudstack was upgraded from 3.0.6 originally 
            Reporter: Justyn Shull


I've seen these errors a few times in the management-server.log now.   It has happened on a few different networks and ips, but we have not located a root cause for it yet.   It basically prevents the VM from being expunged.   

{code}
# grep 'ctx-2a7eee77' /var/log/cloudstack/management/management-server.log
2014-09-04 10:00:41,982 INFO  [c.c.v.UserVmManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Found 1 vms to expunge.
2014-09-04 10:00:41,994 WARN  [o.a.c.f.j.AsyncJobExecutionContext] (UserVm-Scavenger-10:ctx-2a7eee77) Job is executed without a context, setup psudo job for the executing thread
2014-09-04 10:00:42,160 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Sync job-170172 execution on object VmWorkJobQueue.6715
2014-09-04 10:00:42,170 WARN  [c.c.u.d.Merovingian2] (UserVm-Scavenger-10:ctx-2a7eee77) Was unable to find lock for the key vm_instance6715 and thread id 745113246
2014-09-04 10:00:45,678 DEBUG [c.c.c.CapacityManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) VM state transitted from :Expunging to Expunging with event: ExpungeOperationvm's original host id: 12 new host id: null host id before state transition: null
2014-09-04 10:00:45,680 DEBUG [c.c.v.VirtualMachineManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Destroying vm VM[User|i-258-6715-VM]
2014-09-04 10:00:45,681 DEBUG [c.c.v.VirtualMachineManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Cleaning up NICS
2014-09-04 10:00:45,681 DEBUG [o.a.c.e.o.NetworkOrchestrator] (UserVm-Scavenger-10:ctx-2a7eee77) Cleaning network for vm: 6715
2014-09-04 10:00:45,688 DEBUG [c.c.v.VirtualMachineManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Cleaning up hypervisor data structures (ex. SRs in XenServer) for managed storage
2014-09-04 10:00:45,745 DEBUG [o.a.c.e.o.VolumeOrchestrator] (UserVm-Scavenger-10:ctx-2a7eee77) Cleaning storage for vm: 6715
2014-09-04 10:00:45,759 DEBUG [c.c.v.VirtualMachineManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Expunged VM[User|i-258-6715-VM]
2014-09-04 10:00:45,759 DEBUG [c.c.v.UserVmManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Starting cleaning up vm VM[User|i-258-6715-VM] resources...
2014-09-04 10:00:45,830 DEBUG [c.c.n.f.FirewallManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) No firewall rules are found for vm id=6715
2014-09-04 10:00:45,885 DEBUG [c.c.v.UserVmManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Firewall rules are removed successfully as a part of vm id=6715 expunge
2014-09-04 10:00:45,896 DEBUG [c.c.n.r.RulesManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) No port forwarding rules are found for vm id=6715
2014-09-04 10:00:45,896 DEBUG [c.c.v.UserVmManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Port forwarding rules are removed successfully as a part of vm id=6715 expunge
2014-09-04 10:00:45,898 DEBUG [c.c.v.UserVmManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Removed vm id=6715 from all load balancers as a part of expunge process
2014-09-04 10:00:45,914 DEBUG [c.c.n.r.RulesManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Revoking all Firewallrules as a part of disabling static nat for public IP id=235
2014-09-04 10:00:45,940 DEBUG [c.c.n.f.FirewallManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Releasing 0 firewall rules for ip id=235
2014-09-04 10:00:45,943 DEBUG [c.c.n.f.FirewallManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) There are no firewall rules to apply
2014-09-04 10:00:45,945 DEBUG [c.c.n.f.FirewallManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Successfully released firewall rules for ip id=235 and # of rules now = 0
2014-09-04 10:00:45,969 DEBUG [c.c.n.r.RulesManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Releasing 0 port forwarding rules for ip id=235
2014-09-04 10:00:45,972 DEBUG [c.c.n.r.RulesManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Releasing 0 static nat rules for ip id=235
2014-09-04 10:00:45,999 DEBUG [c.c.n.r.RulesManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) There are no port forwarding rules to apply for ip id=235
2014-09-04 10:00:46,001 DEBUG [c.c.n.r.RulesManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) There are no static nat rules to apply for ip id=235
2014-09-04 10:00:46,056 INFO  [o.a.c.e.o.NetworkOrchestrator] (UserVm-Scavenger-10:ctx-2a7eee77) Let VirtualRouter handle StaticNat in network 1734
2014-09-04 10:00:46,069 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Applying static nat rules in network Ntwk[effabcf3-9e09-4402-8340-52e0a6b32d60|Guest|8]
2014-09-04 10:00:46,090 DEBUG [c.c.a.m.ClusteredAgentAttache] (UserVm-Scavenger-10:ctx-2a7eee77) Seq 14-202943458208397152: Forwarding Seq 14-202943458208397152:  { Cmd , MgmtId: 182571079363322, via: 14(hv-11-1.phx), Ver: v1, Flags: 100001, [{"com.cloud.agent.api.routing.SetStaticNatRulesCommand":{"rules":[{"dstIp":"10.244.0.162","id":0,"srcIp":"100.100.100.242","revoked":true,"alreadyAdded":false,"purpose":"StaticNat","icmpType":0,"icmpCode":0,"defaultEgressPolicy":false}],"accessDetails":{"zone.network.type":"Advanced","router.name":"r-6578-VM","router.ip":"169.254.3.202","router.guest.ip":"10.244.0.1"},"wait":0}}] } to 33862771676063
2014-09-04 10:00:47,068 DEBUG [c.c.a.t.Request] (UserVm-Scavenger-10:ctx-2a7eee77) Seq 14-202943458208397152: Received:  { Ans: , MgmtId: 182571079363322, via: 14, Ver: v1, Flags: 0, { Answer } }
2014-09-04 10:00:47,121 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Applying ip association in network Ntwk[effabcf3-9e09-4402-8340-52e0a6b32d60|Guest|8]
2014-09-04 10:00:47,188 DEBUG [c.c.a.m.ClusteredAgentAttache] (UserVm-Scavenger-10:ctx-2a7eee77) Seq 14-202943458208397153: Forwarding Seq 14-202943458208397153:  { Cmd , MgmtId: 182571079363322, via: 14(hv-11-1.phx), Ver: v1, Flags: 100001, [{"com.cloud.agent.api.routing.IpAssocCommand":{"ipAddresses":[{"accountId":258,"publicIp":"100.100.176.192","sourceNat":true,"add":true,"oneToOneNat":false,"firstIP":true,"broadcastUri":"vlan://310","vlanGateway":"100.100.176.1","vlanNetmask":"255.255.255.0","vifMacAddress":"06:67:cc:00:3b:cf","networkRate":1000,"trafficType":"Public","networkName":"cloud-mgmt","newNic":false},{"accountId":258,"publicIp":"100.100.176.43","sourceNat":false,"add":true,"oneToOneNat":true,"firstIP":false,"broadcastUri":"vlan://310","vlanGateway":"100.100.176.1","vlanNetmask":"255.255.255.0","vifMacAddress":"06:8e:ae:00:3b:cf","networkRate":1000,"trafficType":"Public","networkName":"cloud-mgmt","newNic":false},{"accountId":258,"publicIp":"100.100.176.41","sourceNat":false,"add":true,"oneToOneNat":true,"firstIP":false,"broadcastUri":"vlan://310","vlanGateway":"100.100.176.1","vlanNetmask":"255.255.255.0","vifMacAddress":"06:8e:ae:00:3b:cf","networkRate":1000,"trafficType":"Public","networkName":"cloud-mgmt","newNic":false},{"accountId":258,"publicIp":"100.100.176.42","sourceNat":false,"add":true,"oneToOneNat":true,"firstIP":false,"broadcastUri":"vlan://310","vlanGateway":"100.100.176.1","vlanNetmask":"255.255.255.0","vifMacAddress":"06:8e:ae:00:3b:cf","networkRate":1000,"trafficType":"Public","networkName":"cloud-mgmt","newNic":false}],"accessDetails":{"zone.network.type":"Advanced","router.name":"r-6578-VM","router.ip":"169.254.3.202","router.guest.ip":"10.244.0.1"},"wait":0}},{"com.cloud.agent.api.routing.IpAssocCommand":{"ipAddresses":[{"accountId":258,"publicIp":"100.100.100.242","sourceNat":true,"add":false,"oneToOneNat":true,"firstIP":true,"broadcastUri":"vlan://31","vlanGateway":"100.100.100.1","vlanNetmask":"255.255.255.0","vifMacAddress":"06:8c:cf:00:3b:cf","networkRate":1000,"trafficType":"Public","networkName":"cloud-mgmt","newNic":false}],"accessDetails":{"zone.network.type":"Advanced","router.name":"r-6578-VM","router.ip":"169.254.3.202","router.guest.ip":"10.244.0.1"},"wait":0}}] } to 33862771676063
2014-09-04 10:00:55,319 DEBUG [c.c.a.t.Request] (UserVm-Scavenger-10:ctx-2a7eee77) Seq 14-202943458208397153: Received:  { Ans: , MgmtId: 182571079363322, via: 14, Ver: v1, Flags: 0, { GroupAnswer, Answer } }
2014-09-04 10:00:55,319 WARN  [c.c.n.r.RulesManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Failed to create static nat rule due to
2014-09-04 10:00:55,325 WARN  [c.c.n.r.RulesManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Unable to revoke all static nat rules for ip Ip[100.100.100.242-1]
2014-09-04 10:00:55,325 WARN  [c.c.n.r.RulesManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Failed to disable one to one nat for the ip address id235
2014-09-04 10:00:55,325 WARN  [c.c.v.UserVmManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Failed to disable static nat for ip address Ip[100.100.100.242-1] as a part of vm id=6715 expunge
2014-09-04 10:00:55,325 WARN  [c.c.v.UserVmManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Failed to cleanup resources as a part of vm VM[User|i-258-6715-VM] expunge
2014-09-04 10:00:55,326 WARN  [c.c.v.UserVmManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Unable to expunge VM[User|i-258-6715-VM]
{code}

The two exceptions that also happen but aren't visible with the above grep:

{code}
2014-09-04 10:00:55,319 WARN  [c.c.n.r.RulesManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Failed to create static nat rule due to
com.cloud.exception.ResourceUnavailableException: Resource [DataCenter:1] is unreachable: Unable to apply ip association on router
    at com.cloud.network.router.VirtualNetworkApplianceManagerImpl.applyRules(VirtualNetworkApplianceManagerImpl.java:4018)
    at com.cloud.network.router.VirtualNetworkApplianceManagerImpl.associatePublicIP(VirtualNetworkApplianceManagerImpl.java:3812)
    at com.cloud.network.router.VpcVirtualNetworkApplianceManagerImpl.associatePublicIP(VpcVirtualNetworkApplianceManagerImpl.java:496)
    at sun.reflect.GeneratedMethodAccessor392.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
    at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
    at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
    at com.sun.proxy.$Proxy206.associatePublicIP(Unknown Source)
    at com.cloud.network.element.VirtualRouterElement.applyIps(VirtualRouterElement.java:473)
    at com.cloud.network.IpAddressManagerImpl.applyIpAssociations(IpAddressManagerImpl.java:977)
    at com.cloud.network.IpAddressManagerImpl.applyStaticNats(IpAddressManagerImpl.java:1750)
    at com.cloud.network.rules.RulesManagerImpl.applyStaticNatForIp(RulesManagerImpl.java:1325)
    at com.cloud.network.rules.RulesManagerImpl.revokeAllPFAndStaticNatRulesForIp(RulesManagerImpl.java:1104)
    at com.cloud.network.rules.RulesManagerImpl.disableStaticNat(RulesManagerImpl.java:1269)
    at sun.reflect.GeneratedMethodAccessor346.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
    at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
    at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
    at com.sun.proxy.$Proxy122.disableStaticNat(Unknown Source)
    at com.cloud.vm.UserVmManagerImpl.cleanupVmResources(UserVmManagerImpl.java:1742)
    at com.cloud.vm.UserVmManagerImpl.expunge(UserVmManagerImpl.java:1681)
    at com.cloud.vm.UserVmManagerImpl.expungeVm(UserVmManagerImpl.java:3683)
    at com.cloud.vm.UserVmManagerImpl$ExpungeTask.runInContext(UserVmManagerImpl.java:1817)
    at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
    at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
    at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
    at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
    at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

2014-09-04 10:00:55,326 WARN  [c.c.v.UserVmManagerImpl] (UserVm-Scavenger-10:ctx-2a7eee77) Unable to expunge VM[User|i-258-6715-VM]
com.cloud.utils.exception.CloudRuntimeException: Failed to expunge vm with specified vmId
    at com.cloud.vm.UserVmManagerImpl.expungeVm(UserVmManagerImpl.java:3687)
    at com.cloud.vm.UserVmManagerImpl$ExpungeTask.runInContext(UserVmManagerImpl.java:1817)
    at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
    at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
    at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
    at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
    at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
{code}

The i-258-6715-VM vm mentioned is stuck in the state 'Expunging'.    If I log in to the virtual router for this network, I do still see the .242 IP listed when I do 'ip addr show' so it indeed did not get removed from the VR.   The IP was/is a static_nat for the vm trying to be expunged and also seems to be stuck in a 'Releasing' state.

I do not, however, see any nat rules for this public ip in 'iptables -t nat -vnL'.

Rebooting the virtual router might fix this, but I'm not sure what other steps to take from here to get a permanent fix in place.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)