You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cloudstack.apache.org by GitBox <gi...@apache.org> on 2019/10/04 05:28:51 UTC

[GitHub] [cloudstack] nvazquez opened a new pull request #3617: [KVM] Agent LB Fix: Connections from disabled KVM host agents are refused

nvazquez opened a new pull request #3617: [KVM] Agent LB Fix: Connections from disabled KVM host agents are refused
URL: https://github.com/apache/cloudstack/pull/3617
 
 
   ## Description
   This PR introduces a fix for a limitation on the KVM agent load balancing feature.
   
   When a host is in 'Disabled' state and the agent is restarted, then the host is not able to connect to the management server, staying indefinitely in 'Connecting' state
   
   ````
   2019-10-04 05:23:12,242 DEBUG [c.c.r.ResourceState] (AgentConnectTaskPool-5:ctx-e3780a57) (logid:7c4cc142) Resource state update: [id = 4; name = trl-65-k-M7-nvazquez-kvm4; old state = Disabled; event = InternalCreated; new state = Disabled]
   2019-10-04 05:23:12,242 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-5:ctx-e3780a57) (logid:7c4cc142) Transition:[Resource state = Disabled, Agent event = AgentConnected, Host id = 4, name = trl-65-k-M7-nvazquez-kvm4]
   2019-10-04 05:23:12,248 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-5:ctx-e3780a57) (logid:7c4cc142) Failed to handle host connection: 
   java.lang.IndexOutOfBoundsException: fromIndex = -1
   	at java.util.SubList.<init>(AbstractList.java:620)
   	at java.util.RandomAccessSubList.<init>(AbstractList.java:775)
   	at java.util.AbstractList.subList(AbstractList.java:484)
   	at org.apache.cloudstack.agent.lb.algorithm.IndirectAgentLBRoundRobinAlgorithm.sort(IndirectAgentLBRoundRobinAlgorithm.java:44)
   	at org.apache.cloudstack.agent.lb.IndirectAgentLBServiceImpl.getManagementServerList(IndirectAgentLBServiceImpl.java:93)
   	at org.apache.cloudstack.agent.lb.IndirectAgentLBServiceImpl.compareManagementServerList(IndirectAgentLBServiceImpl.java:101)
   	at com.cloud.agent.manager.AgentManagerImpl.handleConnectedAgent(AgentManagerImpl.java:1094)
   	at com.cloud.agent.manager.AgentManagerImpl.access$000(AgentManagerImpl.java:126)
   	at com.cloud.agent.manager.AgentManagerImpl$HandleAgentConnectTask.runInContext(AgentManagerImpl.java:1187)
   	at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
   	at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
   	at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
   	at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
   	at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:748)
   2019-10-04 05:23:12,248 WARN  [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-5:ctx-e3780a57) (logid:7c4cc142) Unable to create attache for agent: Seq 0-0:  { Cmd , MgmtId: -1, via: 0, Ver: v1, Flags: 1, [{"com.cloud.agent.api.StartupRoutingCommand":{"cpuSockets":3,"cpus":3,"speed":1599,"memory":7277121536,"dom0MinMemory":1073741824,"poolSync":false,"supportsClonedVolumes":false,"caps":"hvm,snapshot","pool":"/root","hypervisorType":"KVM","hostDetails":{"Host.OS.Kernel.Version":"3.10.0-957.12.2.el7.x86_64","com.cloud.network.Networks.RouterPrivateIpStrategy":"HostLocal","Host.OS.Version":"7.6.1810","secured":"true","Host.OS":"CentOS"},"hostTags":[],"groupDetails":{},"type":"Routing","dataCenter":"1","pod":"1","cluster":"1","guid":"d609214d-020e-3727-8515-c58c792f1d1a-LibvirtComputingResource","name":"trl-65-k-M7-nvazquez-kvm4","id":0,"version":"4.14.0.0-SNAPSHOT","iqn":"iqn.1994-05.com.redhat:213e8ff8bef8","privateIpAddress":"10.2.2.79","privateMacAddress":"1e:00:56:01:07:47","privateNetmask":"255.255.0.0","storageIpAddress":"10.2.2.79","storageNetmask":"255.255.0.0","storageMacAddress":"1e:00:56:01:07:47","resourceName":"LibvirtComputingResource","gatewayIpAddress":"10.2.254.254","msHostList":"10.2.2.84,10.2.2.104@roundrobin","wait":0}},{"com.cloud.agent.api.StartupStorageCommand":{"totalSize":0,"poolInfo":{"uuid":"a1708750-e55c-4b15-89b3-2b9cc39b6a77","host":"10.2.2.79","localPath":"/var/lib/libvirt/images","hostPath":"/var/lib/libvirt/images","poolType":"Filesystem","capacityBytes":20935868416,"availableBytes":19100352512},"resourceType":"STORAGE_POOL","hostDetails":{},"type":"Storage","dataCenter":"1","pod":"1","guid":"d609214d-020e-3727-8515-c58c792f1d1a-LibvirtComputingResource","name":"trl-65-k-M7-nvazquez-kvm4","id":0,"version":"4.14.0.0-SNAPSHOT","resourceName":"LibvirtComputingResource","msHostList":"10.2.2.84,10.2.2.104@roundrobin","wait":0}}] }
   2019-10-04 05:23:12,263 WARN  [c.c.a.m.AgentManagerImpl] (AgentManager-Handler-13:null) (logid:) Throwing away a request because it came through as the first command on a connect: Seq 0--1:  { Cmd , MgmtId: -1, via: 0, Ver: v1, Flags: 111, [{"com.cloud.agent.api.ShutdownCommand":{"reason":"sig.kill","wait":0}}] }
   2019-10-04 05:23:12,395 WARN  [c.c.a.m.AgentManagerImpl] (AgentManager-Handler-14:null) (logid:) Throwing away a request because it came through as the first command on a connect: Seq 0-1:  { Cmd , MgmtId: -1, via: 0, Ver: v1, Flags: 11, [{"com.cloud.agent.api.PingRoutingWithNwGroupsCommand":{"newGroupStates":{},"_hostVmStateReport":{},"_gatewayAccessible":true,"_vnetAccessible":true,"hostType":"Routing","hostId":0,"wait":0}}] }
   2019-10-04 05:23:13,320 INFO  [c.c.a.m.AgentManagerImpl] (AgentManager-Handler-15:null) (logid:) Connection from /10.2.2.79 closed but no cleanup was done.
   ````
   
   The main reason is that when the administrator changed the value of the global configurations `host` or `indirect.agent.lb.algorithm`, the LB mechanism in the management server sent the updated values of these configurations only to the hosts in 'Enabled' state. In particular, hosts in 'Disabled' state did not receive the updated management server list (and LB algorithm).
   
   ## Types of changes
   - [ ] Breaking change (fix or feature that would cause existing functionality to change)
   - [ ] New feature (non-breaking change which adds functionality)
   - [x] Bug fix (non-breaking change which fixes an issue)
   - [ ] Enhancement (improves an existing feature and functionality)
   - [ ] Cleanup (Code refactoring and cleanup, that may add test cases)
   
   ## Screenshots (if appropriate):
   
   ## How Has This Been Tested?
   In a KVM environment:
   
   Case 1:
   - Disable a host H
   - Restart agent on host H
   - Verify that the host is 'Up'
   
   Case 2:
   - Disable a host H
   - Update one of the global settings: `host` or `indirect.agent.lb.algorithm`
   - Verify that agent in host H receives the updated values of the configurations
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services