You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cloudstack.apache.org by Chen Zhang <ia...@gmail.com> on 2018/02/23 14:00:36 UTC

CPVM/SSVM agent state disconnects

Hello,


I am new in the list and I am stuck with a very annoying issue on
CPVM/SSVM.


When I start the Cloudstack-management, everything is good. After around 3-4
<outlook-data-detector://0> hours, the agent state of CPVM/SSVM
automatically turns to "Disconnected" and the secondary storage goes to
"0kb/0kb", but the VM state is still "running". Once manually rebooting
CPVM/SSVM, the agent state would turn back to "up" and the secondary
storage would be back as well. After 3-4 hours, the issue repeats again.


Here is the log when SSVM/CPVM goes down:


----
2018-02-21 15:57:47,517 INFO [c.c.a.m.AgentManagerImpl]
(AgentMonitor-1:ctx-81471e1e) (logid:d0bdac05) Found the following agents
behind on ping: [3]
2018-02-21 15:57:47,521 WARN [c.c.a.m.AgentManagerImpl]
(AgentMonitor-1:ctx-81471e1e) (logid:d0bdac05) Disconnect agent for
CPVM/SSVM due to physical connection close. host: 3
2018-02-21 15:57:47,522 INFO [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Host 3 is disconnecting
with event ShutdownRequested
2018-02-21 15:57:47,524 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) The next status of agent
3is Disconnected, current status is Up
2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Deregistering link for 3
with state Disconnected
2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Remove Agent : 3
2018-02-21 15:57:47,525 DEBUG [c.c.a.m.ConnectedAgentAttache]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Processing Disconnect.
2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentAttache]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Seq 3-906630899985023222:
Sending disconnect to class com.cloud.agent.manager.SynchronousListener
2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
listener: com.cloud.hypervisor.xenserver.discoverer.XcpServerDiscoverer
2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
listener: com.cloud.hypervisor.hyperv.discoverer.HypervServerDiscoverer
2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
listener: com.cloud.storage.listener.StoragePoolMonitor
2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
listener: org.apache.cloudstack.engine.orchestration.NetworkOrchestrator
2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
listener: com.cloud.storage.secondary.SecondaryStorageListener
2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
listener: com.cloud.network.security.SecurityGroupListener
2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentAttache]
(StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq 3-906630899985023222:
Waiting some more time because this is the current command
2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
listener: com.cloud.deploy.DeploymentPlanningManagerImpl
2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
listener: com.cloud.vm.ClusteredVirtualMachineManagerImpl
2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
listener: com.cloud.network.SshKeysDistriMonitor
2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
listener: com.cloud.network.router.VirtualNetworkApplianceManagerImpl
2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
listener: com.cloud.consoleproxy.ConsoleProxyListener
2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentAttache]
(StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq 3-906630899985023222:
Waiting some more time because this is the current command
2018-02-21 15:57:47,526 INFO [c.c.u.e.CSExceptionErrorCode]
(StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Could not find exception:
com.cloud.exception.OperationTimedoutException in error code list for
exceptions
2018-02-21 15:57:47,526 WARN [c.c.a.m.AgentAttache]
(StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq 3-906630899985023222:
Timed out on null
2018-02-21 15:57:47,526 DEBUG [c.c.a.m.AgentAttache]
(StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq 3-906630899985023222:
Cancelling.
2018-02-21 15:57:47,526 DEBUG [o.a.c.s.RemoteHostEndPoint]
(StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Failed to send command,
due to Agent:3, com.cloud.exception.OperationTimedoutException: Commands
906630899985023222 to Host 3 timed out after 3600
2018-02-21 15:57:47,526 ERROR [c.c.s.StatsCollector]
(StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Error trying to retrieve
storage stats
com.cloud.utils.exception.CloudRuntimeException: Failed to send command,
due to Agent:3, com.cloud.exception.OperationTimedoutException: Commands
906630899985023222 to Host 3 timed out after 3600
at
org.apache.cloudstack.storage.RemoteHostEndPoint.sendMessage(RemoteHostEndPoint.java:133)
at
com.cloud.server.StatsCollector$StorageCollector.runInContext(StatsCollector.java:985)
at
org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
at
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
listener:
com.cloud.network.NetworkUsageManagerImpl$DirectNetworkStatsListener
2018-02-21 15:57:47,527 DEBUG [c.c.n.NetworkUsageManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Disconnected called on 3
with status Disconnected
2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
listener: com.cloud.agent.manager.AgentManagerImpl$BehindOnPingListener
2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
listener: com.cloud.agent.manager.AgentManagerImpl$SetHostParamsListener
2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
listener: com.cloud.capacity.StorageCapacityListener
2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
listener: com.cloud.capacity.ComputeCapacityListener
2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
listener: com.cloud.network.SshKeysDistriMonitor
2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
listener: com.cloud.network.router.VpcVirtualNetworkApplianceManagerImpl
2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
listener: com.cloud.storage.LocalStoragePoolListener
2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
listener: com.cloud.storage.upload.UploadListener
2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
listener: com.cloud.storage.download.DownloadListener
2018-02-21 15:57:47,527 DEBUG [c.c.h.Status] (AgentTaskPool-7:ctx-67ec16e3)
(logid:d6a36e24) Transition:[Resource state = Enabled, Agent event =
ShutdownRequested, Host id = 3, name = s-1-VM]
2018-02-21 15:57:47,620 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Notifying other nodes of to
disconnect
----

When the issue arises, all instances, hosts, and other resources are
running fine. I just updated the cloudstack-management and cloudstack-agent
to to 4.11, but the problem is still there. Any ideas?


Thanks!

Chen

Re: CPVM/SSVM agent state disconnects

Posted by Dag Sonstebo <Da...@shapeblue.com>.
What is your global setting for "host" set to?

Regards, 
Dag Sonstebo
Cloud Architect
ShapeBlue
 S: +44 20 3603 0540  | dag.sonstebo@shapeblue.com | http://www.shapeblue.com <http://www.shapeblue.com/> | Twitter:@ShapeBlue <https://twitter.com/#!/shapeblue>


On 27/02/2018, 16:40, "Chen Zhang" <ia...@gmail.com> wrote:

    Hi Dag,
    
    Yes the VR is online all the time. I checked the cloud.log inside the
    system VM, here is the problem:
    
    2018-02-27 09:01:23,345 INFO
    [storage.resource.NfsSecondaryStorageResource]
    (agentRequest-Handler-5:null) Determined host 192.168.1.101 corresponds to
    IP 192.168.1.101
    2018-02-27 09:02:23,650 INFO
    [storage.resource.NfsSecondaryStorageResource]
    (agentRequest-Handler-2:null) Determined host 192.168.1.101 corresponds to
    IP 192.168.1.101
    2018-02-27 09:03:23,915 INFO
    [storage.resource.NfsSecondaryStorageResource]
    (agentRequest-Handler-4:null) Determined host 192.168.1.101 corresponds to
    IP 192.168.1.101
    2018-02-27 09:04:24,240 INFO
    [storage.resource.NfsSecondaryStorageResource]
    (agentRequest-Handler-3:null) Determined host 192.168.1.101 corresponds to
    IP 192.168.1.101
    2018-02-27 09:05:24,507 INFO
    [storage.resource.NfsSecondaryStorageResource]
    (agentRequest-Handler-1:null) Determined host 192.168.1.101 corresponds to
    IP 192.168.1.101
    2018-02-27 09:06:24,773 INFO
    [storage.resource.NfsSecondaryStorageResource]
    (agentRequest-Handler-5:null) Determined host 192.168.1.101 corresponds to
    IP 192.168.1.101
    2018-02-27 09:07:25,296 INFO
    [storage.resource.NfsSecondaryStorageResource]
    (agentRequest-Handler-2:null) Determined host 192.168.1.101 corresponds to
    IP 192.168.1.101
    2018-02-27 09:09:57,210 INFO  [cloud.agent.Agent] (Agent-Handler-2:null)
    Lost connection to the server. Dealing with the remaining commands...
    2018-02-27 09:09:57,218 INFO  [utils.nio.NioClient] (Agent-Handler-2:null)
    NioClient connection closed
    2018-02-27 09:09:57,218 INFO  [cloud.agent.Agent] (Agent-Handler-2:null)
    Reconnecting to host:129.*.*.*
    2018-02-27 09:09:57,219 INFO  [utils.nio.NioClient] (Agent-Handler-2:null)
    Connecting to 129.*.*.*:8250
    2018-02-27 09:09:57,228 ERROR [utils.nio.NioConnection]
    (Agent-Handler-2:null) Unable to initialize the threads.
    java.net.NoRouteToHostException: No route to host
    at sun.nio.ch.Net.connect0(Native Method)
    at sun.nio.ch.Net.connect(Net.java:454)
    at sun.nio.ch.Net.connect(Net.java:446)
    at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648)
    at com.cloud.utils.nio.NioClient.init(NioClient.java:56)
    at com.cloud.utils.nio.NioConnection.start(NioConnection.java:95)
    at com.cloud.agent.Agent.reconnect(Agent.java:442)
    at com.cloud.agent.Agent$ServerHandler.doTask(Agent.java:1014)
    at com.cloud.utils.nio.Task.call(Task.java:83)
    at com.cloud.utils.nio.Task.call(Task.java:29)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at
    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
    2018-02-27 09:09:57,249 INFO  [utils.exception.CSExceptionErrorCode]
    (Agent-Handler-2:null) Could not find exception:
    com.cloud.utils.exception.NioConnectionException in error code list for
    exceptions
    2018-02-27 09:09:57,250 WARN  [cloud.agent.Agent] (Agent-Handler-2:null)
    NIO Connection Exception  com.cloud.utils.exception.NioConnectionException:
    No route to host
    2018-02-27 09:09:57,250 INFO  [cloud.agent.Agent] (Agent-Handler-2:null)
    Attempted to connect to the server, but received an unexpected exception,
    trying again...
    2018-02-27 09:09:57,250 INFO  [utils.nio.NioClient] (Agent-Handler-2:null)
    NioClient connection closed
    
    The management server has two IPs, the public IP (129.*.*.*) and a local IP
    (192.168.1.101). The 8250 port is restricted by the public IP so it cannot
    be accessed. I use the local IP as the cluster node ip and host ip in all
    agents, so I do not understand why the system VM always suddenly
    disconnected with the local ip and started connecting to the public IP. Is
    there any way to fix the IP to local IP?
    
    Thanks!
    Chen
    
    On Fri, Feb 23, 2018 at 11:01 AM, Dag Sonstebo <Da...@shapeblue.com>
    wrote:
    
    > Do VRs stay online and connected?
    >
    > What you need to do next is check your cloud.log on the system VMs,
    > possibly also up the verbosity level in the logs to catch why they are
    > dropping comms.
    >
    > Regards,
    > Dag Sonstebo
    > Cloud Architect
    > ShapeBlue
    >
    > On 23/02/2018, 15:25, "Chen Zhang" <ia...@gmail.com> wrote:
    >
    >     Hi Dag,
    >
    >     Yes I did recreate the new system VMs. The version is "Cloudstack
    > release
    >     4.11.0".
    >
    >     Thanks!
    >     Chen
    >
    >     On Fri, Feb 23, 2018 at 9:27 AM, Dag Sonstebo <
    > Dag.Sonstebo@shapeblue.com>
    >     wrote:
    >
    >     > Hi Chen,
    >     >
    >     > You say you just upgraded to 4.11 – did you destroy your system VMs
    > and
    >     > let them recreate after the upgrade?
    >     >
    >     > Can you also check what version a “cat /etc/cloudstack-release”
    > shows up
    >     > with on your SSVM/CPVM?
    >     >
    >     > Regards,
    >     > Dag Sonstebo
    >     > Cloud Architect
    >     > ShapeBlue
    >     >
    >     > On 23/02/2018, 14:00, "Chen Zhang" <ia...@gmail.com> wrote:
    >     >
    >     >     Hello,
    >     >
    >     >
    >     >     I am new in the list and I am stuck with a very annoying issue on
    >     >     CPVM/SSVM.
    >     >
    >     >
    >     >     When I start the Cloudstack-management, everything is good. After
    >     > around 3-4
    >     >     <outlook-data-detector://0> hours, the agent state of CPVM/SSVM
    >     >     automatically turns to "Disconnected" and the secondary storage
    > goes to
    >     >     "0kb/0kb", but the VM state is still "running". Once manually
    > rebooting
    >     >     CPVM/SSVM, the agent state would turn back to "up" and the
    > secondary
    >     >     storage would be back as well. After 3-4 hours, the issue repeats
    >     > again.
    >     >
    >     >
    >     >     Here is the log when SSVM/CPVM goes down:
    >     >
    >     >
    >     >     ----
    >     >     2018-02-21 15:57:47,517 INFO [c.c.a.m.AgentManagerImpl]
    >     >     (AgentMonitor-1:ctx-81471e1e) (logid:d0bdac05) Found the
    > following
    >     > agents
    >     >     behind on ping: [3]
    >     >     2018-02-21 15:57:47,521 WARN [c.c.a.m.AgentManagerImpl]
    >     >     (AgentMonitor-1:ctx-81471e1e) (logid:d0bdac05) Disconnect agent
    > for
    >     >     CPVM/SSVM due to physical connection close. host: 3
    >     >     2018-02-21 15:57:47,522 INFO [c.c.a.m.AgentManagerImpl]
    >     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Host 3 is
    > disconnecting
    >     >     with event ShutdownRequested
    >     >     2018-02-21 15:57:47,524 DEBUG [c.c.a.m.AgentManagerImpl]
    >     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) The next status
    > of
    >     > agent
    >     >     3is Disconnected, current status is Up
    >     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    >     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Deregistering
    > link for
    >     > 3
    >     >     with state Disconnected
    >     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    >     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Remove Agent : 3
    >     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.ConnectedAgentAttache]
    >     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Processing
    > Disconnect.
    >     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentAttache]
    >     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Seq
    >     > 3-906630899985023222:
    >     >     Sending disconnect to class com.cloud.agent.manager.
    >     > SynchronousListener
    >     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    >     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
    > Disconnect to
    >     >     listener: com.cloud.hypervisor.xenserver.discoverer.
    >     > XcpServerDiscoverer
    >     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    >     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
    > Disconnect to
    >     >     listener: com.cloud.hypervisor.hyperv.discoverer.
    >     > HypervServerDiscoverer
    >     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    >     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
    > Disconnect to
    >     >     listener: com.cloud.storage.listener.StoragePoolMonitor
    >     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    >     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
    > Disconnect to
    >     >     listener: org.apache.cloudstack.engine.orchestration.
    >     > NetworkOrchestrator
    >     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    >     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
    > Disconnect to
    >     >     listener: com.cloud.storage.secondary.SecondaryStorageListener
    >     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    >     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
    > Disconnect to
    >     >     listener: com.cloud.network.security.SecurityGroupListener
    >     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentAttache]
    >     >     (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq
    >     > 3-906630899985023222:
    >     >     Waiting some more time because this is the current command
    >     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    >     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
    > Disconnect to
    >     >     listener: com.cloud.deploy.DeploymentPlanningManagerImpl
    >     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    >     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
    > Disconnect to
    >     >     listener: com.cloud.vm.ClusteredVirtualMachineManagerImpl
    >     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    >     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
    > Disconnect to
    >     >     listener: com.cloud.network.SshKeysDistriMonitor
    >     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    >     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
    > Disconnect to
    >     >     listener: com.cloud.network.router.
    > VirtualNetworkApplianceManagerImpl
    >     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    >     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
    > Disconnect to
    >     >     listener: com.cloud.consoleproxy.ConsoleProxyListener
    >     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentAttache]
    >     >     (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq
    >     > 3-906630899985023222:
    >     >     Waiting some more time because this is the current command
    >     >     2018-02-21 15:57:47,526 INFO [c.c.u.e.CSExceptionErrorCode]
    >     >     (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Could not find
    >     > exception:
    >     >     com.cloud.exception.OperationTimedoutException in error code
    > list for
    >     >     exceptions
    >     >     2018-02-21 15:57:47,526 WARN [c.c.a.m.AgentAttache]
    >     >     (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq
    >     > 3-906630899985023222:
    >     >     Timed out on null
    >     >     2018-02-21 15:57:47,526 DEBUG [c.c.a.m.AgentAttache]
    >     >     (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq
    >     > 3-906630899985023222:
    >     >     Cancelling.
    >     >     2018-02-21 15:57:47,526 DEBUG [o.a.c.s.RemoteHostEndPoint]
    >     >     (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Failed to send
    >     > command,
    >     >     due to Agent:3, com.cloud.exception.OperationTimedoutException:
    >     > Commands
    >     >     906630899985023222 to Host 3 timed out after 3600
    >     >     2018-02-21 15:57:47,526 ERROR [c.c.s.StatsCollector]
    >     >     (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Error trying to
    >     > retrieve
    >     >     storage stats
    >     >     com.cloud.utils.exception.CloudRuntimeException: Failed to send
    >     > command,
    >     >     due to Agent:3, com.cloud.exception.OperationTimedoutException:
    >     > Commands
    >     >     906630899985023222 to Host 3 timed out after 3600
    >     >     at
    >     >     org.apache.cloudstack.storage.RemoteHostEndPoint.sendMessage(
    >     > RemoteHostEndPoint.java:133)
    >     >     at
    >     >     com.cloud.server.StatsCollector$StorageCollector.runInContext(
    >     > StatsCollector.java:985)
    >     >     at
    >     >     org.apache.cloudstack.managed.context.
    > ManagedContextRunnable$1.run(
    >     > ManagedContextRunnable.java:49)
    >     >     at
    >     >     org.apache.cloudstack.managed.context.impl.
    >     > DefaultManagedContext$1.call(DefaultManagedContext.java:56)
    >     >     at
    >     >     org.apache.cloudstack.managed.context.impl.
    > DefaultManagedContext.
    >     > callWithContext(DefaultManagedContext.java:103)
    >     >     at
    >     >     org.apache.cloudstack.managed.context.impl.
    > DefaultManagedContext.
    >     > runWithContext(DefaultManagedContext.java:53)
    >     >     at
    >     >     org.apache.cloudstack.managed.context.
    > ManagedContextRunnable.run(
    >     > ManagedContextRunnable.java:46)
    >     >     at java.util.concurrent.Executors$RunnableAdapter.
    >     > call(Executors.java:511)
    >     >     at java.util.concurrent.FutureTask.runAndReset(
    > FutureTask.java:308)
    >     >     at
    >     >     java.util.concurrent.ScheduledThreadPoolExecutor$
    >     > ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    >     >     at
    >     >     java.util.concurrent.ScheduledThreadPoolExecutor$
    >     > ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    >     >     at
    >     >     java.util.concurrent.ThreadPoolExecutor.runWorker(
    >     > ThreadPoolExecutor.java:1149)
    >     >     at
    >     >     java.util.concurrent.ThreadPoolExecutor$Worker.run(
    >     > ThreadPoolExecutor.java:624)
    >     >     at java.lang.Thread.run(Thread.java:748)
    >     >     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
    >     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
    > Disconnect to
    >     >     listener:
    >     >     com.cloud.network.NetworkUsageManagerImpl$
    > DirectNetworkStatsListener
    >     >     2018-02-21 15:57:47,527 DEBUG [c.c.n.NetworkUsageManagerImpl]
    >     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Disconnected
    > called on
    >     > 3
    >     >     with status Disconnected
    >     >     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
    >     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
    > Disconnect to
    >     >     listener: com.cloud.agent.manager.AgentManagerImpl$
    >     > BehindOnPingListener
    >     >     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
    >     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
    > Disconnect to
    >     >     listener: com.cloud.agent.manager.AgentManagerImpl$
    >     > SetHostParamsListener
    >     >     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
    >     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
    > Disconnect to
    >     >     listener: com.cloud.capacity.StorageCapacityListener
    >     >     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
    >     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
    > Disconnect to
    >     >     listener: com.cloud.capacity.ComputeCapacityListener
    >     >     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
    >     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
    > Disconnect to
    >     >     listener: com.cloud.network.SshKeysDistriMonitor
    >     >     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
    >     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
    > Disconnect to
    >     >     listener: com.cloud.network.router.
    > VpcVirtualNetworkApplianceMana
    >     > gerImpl
    >     >     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
    >     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
    > Disconnect to
    >     >     listener: com.cloud.storage.LocalStoragePoolListener
    >     >     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
    >     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
    > Disconnect to
    >     >     listener: com.cloud.storage.upload.UploadListener
    >     >     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
    >     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
    > Disconnect to
    >     >     listener: com.cloud.storage.download.DownloadListener
    >     >     2018-02-21 15:57:47,527 DEBUG [c.c.h.Status]
    >     > (AgentTaskPool-7:ctx-67ec16e3)
    >     >     (logid:d6a36e24) Transition:[Resource state = Enabled, Agent
    > event =
    >     >     ShutdownRequested, Host id = 3, name = s-1-VM]
    >     >     2018-02-21 15:57:47,620 DEBUG [c.c.a.m.
    > ClusteredAgentManagerImpl]
    >     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Notifying other
    > nodes
    >     > of to
    >     >     disconnect
    >     >     ----
    >     >
    >     >     When the issue arises, all instances, hosts, and other resources
    > are
    >     >     running fine. I just updated the cloudstack-management and
    >     > cloudstack-agent
    >     >     to to 4.11, but the problem is still there. Any ideas?
    >     >
    >     >
    >     >     Thanks!
    >     >
    >     >     Chen
    >     >
    >     >
    >     >
    >     > Dag.Sonstebo@shapeblue.com
    >     > www.shapeblue.com
    >     > 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
    >     > @shapeblue
    >     >
    >     >
    >     >
    >     >
    >
    >
    >
    > Dag.Sonstebo@shapeblue.com
    > www.shapeblue.com
    > 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
    > @shapeblue
    >
    >
    >
    >
    


Dag.Sonstebo@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 


Re: CPVM/SSVM agent state disconnects

Posted by Chen Zhang <ia...@gmail.com>.
Hi Dag,

Yes the VR is online all the time. I checked the cloud.log inside the
system VM, here is the problem:

2018-02-27 09:01:23,345 INFO
[storage.resource.NfsSecondaryStorageResource]
(agentRequest-Handler-5:null) Determined host 192.168.1.101 corresponds to
IP 192.168.1.101
2018-02-27 09:02:23,650 INFO
[storage.resource.NfsSecondaryStorageResource]
(agentRequest-Handler-2:null) Determined host 192.168.1.101 corresponds to
IP 192.168.1.101
2018-02-27 09:03:23,915 INFO
[storage.resource.NfsSecondaryStorageResource]
(agentRequest-Handler-4:null) Determined host 192.168.1.101 corresponds to
IP 192.168.1.101
2018-02-27 09:04:24,240 INFO
[storage.resource.NfsSecondaryStorageResource]
(agentRequest-Handler-3:null) Determined host 192.168.1.101 corresponds to
IP 192.168.1.101
2018-02-27 09:05:24,507 INFO
[storage.resource.NfsSecondaryStorageResource]
(agentRequest-Handler-1:null) Determined host 192.168.1.101 corresponds to
IP 192.168.1.101
2018-02-27 09:06:24,773 INFO
[storage.resource.NfsSecondaryStorageResource]
(agentRequest-Handler-5:null) Determined host 192.168.1.101 corresponds to
IP 192.168.1.101
2018-02-27 09:07:25,296 INFO
[storage.resource.NfsSecondaryStorageResource]
(agentRequest-Handler-2:null) Determined host 192.168.1.101 corresponds to
IP 192.168.1.101
2018-02-27 09:09:57,210 INFO  [cloud.agent.Agent] (Agent-Handler-2:null)
Lost connection to the server. Dealing with the remaining commands...
2018-02-27 09:09:57,218 INFO  [utils.nio.NioClient] (Agent-Handler-2:null)
NioClient connection closed
2018-02-27 09:09:57,218 INFO  [cloud.agent.Agent] (Agent-Handler-2:null)
Reconnecting to host:129.*.*.*
2018-02-27 09:09:57,219 INFO  [utils.nio.NioClient] (Agent-Handler-2:null)
Connecting to 129.*.*.*:8250
2018-02-27 09:09:57,228 ERROR [utils.nio.NioConnection]
(Agent-Handler-2:null) Unable to initialize the threads.
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.Net.connect0(Native Method)
at sun.nio.ch.Net.connect(Net.java:454)
at sun.nio.ch.Net.connect(Net.java:446)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648)
at com.cloud.utils.nio.NioClient.init(NioClient.java:56)
at com.cloud.utils.nio.NioConnection.start(NioConnection.java:95)
at com.cloud.agent.Agent.reconnect(Agent.java:442)
at com.cloud.agent.Agent$ServerHandler.doTask(Agent.java:1014)
at com.cloud.utils.nio.Task.call(Task.java:83)
at com.cloud.utils.nio.Task.call(Task.java:29)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2018-02-27 09:09:57,249 INFO  [utils.exception.CSExceptionErrorCode]
(Agent-Handler-2:null) Could not find exception:
com.cloud.utils.exception.NioConnectionException in error code list for
exceptions
2018-02-27 09:09:57,250 WARN  [cloud.agent.Agent] (Agent-Handler-2:null)
NIO Connection Exception  com.cloud.utils.exception.NioConnectionException:
No route to host
2018-02-27 09:09:57,250 INFO  [cloud.agent.Agent] (Agent-Handler-2:null)
Attempted to connect to the server, but received an unexpected exception,
trying again...
2018-02-27 09:09:57,250 INFO  [utils.nio.NioClient] (Agent-Handler-2:null)
NioClient connection closed

The management server has two IPs, the public IP (129.*.*.*) and a local IP
(192.168.1.101). The 8250 port is restricted by the public IP so it cannot
be accessed. I use the local IP as the cluster node ip and host ip in all
agents, so I do not understand why the system VM always suddenly
disconnected with the local ip and started connecting to the public IP. Is
there any way to fix the IP to local IP?

Thanks!
Chen

On Fri, Feb 23, 2018 at 11:01 AM, Dag Sonstebo <Da...@shapeblue.com>
wrote:

> Do VRs stay online and connected?
>
> What you need to do next is check your cloud.log on the system VMs,
> possibly also up the verbosity level in the logs to catch why they are
> dropping comms.
>
> Regards,
> Dag Sonstebo
> Cloud Architect
> ShapeBlue
>
> On 23/02/2018, 15:25, "Chen Zhang" <ia...@gmail.com> wrote:
>
>     Hi Dag,
>
>     Yes I did recreate the new system VMs. The version is "Cloudstack
> release
>     4.11.0".
>
>     Thanks!
>     Chen
>
>     On Fri, Feb 23, 2018 at 9:27 AM, Dag Sonstebo <
> Dag.Sonstebo@shapeblue.com>
>     wrote:
>
>     > Hi Chen,
>     >
>     > You say you just upgraded to 4.11 – did you destroy your system VMs
> and
>     > let them recreate after the upgrade?
>     >
>     > Can you also check what version a “cat /etc/cloudstack-release”
> shows up
>     > with on your SSVM/CPVM?
>     >
>     > Regards,
>     > Dag Sonstebo
>     > Cloud Architect
>     > ShapeBlue
>     >
>     > On 23/02/2018, 14:00, "Chen Zhang" <ia...@gmail.com> wrote:
>     >
>     >     Hello,
>     >
>     >
>     >     I am new in the list and I am stuck with a very annoying issue on
>     >     CPVM/SSVM.
>     >
>     >
>     >     When I start the Cloudstack-management, everything is good. After
>     > around 3-4
>     >     <outlook-data-detector://0> hours, the agent state of CPVM/SSVM
>     >     automatically turns to "Disconnected" and the secondary storage
> goes to
>     >     "0kb/0kb", but the VM state is still "running". Once manually
> rebooting
>     >     CPVM/SSVM, the agent state would turn back to "up" and the
> secondary
>     >     storage would be back as well. After 3-4 hours, the issue repeats
>     > again.
>     >
>     >
>     >     Here is the log when SSVM/CPVM goes down:
>     >
>     >
>     >     ----
>     >     2018-02-21 15:57:47,517 INFO [c.c.a.m.AgentManagerImpl]
>     >     (AgentMonitor-1:ctx-81471e1e) (logid:d0bdac05) Found the
> following
>     > agents
>     >     behind on ping: [3]
>     >     2018-02-21 15:57:47,521 WARN [c.c.a.m.AgentManagerImpl]
>     >     (AgentMonitor-1:ctx-81471e1e) (logid:d0bdac05) Disconnect agent
> for
>     >     CPVM/SSVM due to physical connection close. host: 3
>     >     2018-02-21 15:57:47,522 INFO [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Host 3 is
> disconnecting
>     >     with event ShutdownRequested
>     >     2018-02-21 15:57:47,524 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) The next status
> of
>     > agent
>     >     3is Disconnected, current status is Up
>     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Deregistering
> link for
>     > 3
>     >     with state Disconnected
>     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Remove Agent : 3
>     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.ConnectedAgentAttache]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Processing
> Disconnect.
>     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentAttache]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Seq
>     > 3-906630899985023222:
>     >     Sending disconnect to class com.cloud.agent.manager.
>     > SynchronousListener
>     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
> Disconnect to
>     >     listener: com.cloud.hypervisor.xenserver.discoverer.
>     > XcpServerDiscoverer
>     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
> Disconnect to
>     >     listener: com.cloud.hypervisor.hyperv.discoverer.
>     > HypervServerDiscoverer
>     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
> Disconnect to
>     >     listener: com.cloud.storage.listener.StoragePoolMonitor
>     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
> Disconnect to
>     >     listener: org.apache.cloudstack.engine.orchestration.
>     > NetworkOrchestrator
>     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
> Disconnect to
>     >     listener: com.cloud.storage.secondary.SecondaryStorageListener
>     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
> Disconnect to
>     >     listener: com.cloud.network.security.SecurityGroupListener
>     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentAttache]
>     >     (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq
>     > 3-906630899985023222:
>     >     Waiting some more time because this is the current command
>     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
> Disconnect to
>     >     listener: com.cloud.deploy.DeploymentPlanningManagerImpl
>     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
> Disconnect to
>     >     listener: com.cloud.vm.ClusteredVirtualMachineManagerImpl
>     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
> Disconnect to
>     >     listener: com.cloud.network.SshKeysDistriMonitor
>     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
> Disconnect to
>     >     listener: com.cloud.network.router.
> VirtualNetworkApplianceManagerImpl
>     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
> Disconnect to
>     >     listener: com.cloud.consoleproxy.ConsoleProxyListener
>     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentAttache]
>     >     (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq
>     > 3-906630899985023222:
>     >     Waiting some more time because this is the current command
>     >     2018-02-21 15:57:47,526 INFO [c.c.u.e.CSExceptionErrorCode]
>     >     (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Could not find
>     > exception:
>     >     com.cloud.exception.OperationTimedoutException in error code
> list for
>     >     exceptions
>     >     2018-02-21 15:57:47,526 WARN [c.c.a.m.AgentAttache]
>     >     (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq
>     > 3-906630899985023222:
>     >     Timed out on null
>     >     2018-02-21 15:57:47,526 DEBUG [c.c.a.m.AgentAttache]
>     >     (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq
>     > 3-906630899985023222:
>     >     Cancelling.
>     >     2018-02-21 15:57:47,526 DEBUG [o.a.c.s.RemoteHostEndPoint]
>     >     (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Failed to send
>     > command,
>     >     due to Agent:3, com.cloud.exception.OperationTimedoutException:
>     > Commands
>     >     906630899985023222 to Host 3 timed out after 3600
>     >     2018-02-21 15:57:47,526 ERROR [c.c.s.StatsCollector]
>     >     (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Error trying to
>     > retrieve
>     >     storage stats
>     >     com.cloud.utils.exception.CloudRuntimeException: Failed to send
>     > command,
>     >     due to Agent:3, com.cloud.exception.OperationTimedoutException:
>     > Commands
>     >     906630899985023222 to Host 3 timed out after 3600
>     >     at
>     >     org.apache.cloudstack.storage.RemoteHostEndPoint.sendMessage(
>     > RemoteHostEndPoint.java:133)
>     >     at
>     >     com.cloud.server.StatsCollector$StorageCollector.runInContext(
>     > StatsCollector.java:985)
>     >     at
>     >     org.apache.cloudstack.managed.context.
> ManagedContextRunnable$1.run(
>     > ManagedContextRunnable.java:49)
>     >     at
>     >     org.apache.cloudstack.managed.context.impl.
>     > DefaultManagedContext$1.call(DefaultManagedContext.java:56)
>     >     at
>     >     org.apache.cloudstack.managed.context.impl.
> DefaultManagedContext.
>     > callWithContext(DefaultManagedContext.java:103)
>     >     at
>     >     org.apache.cloudstack.managed.context.impl.
> DefaultManagedContext.
>     > runWithContext(DefaultManagedContext.java:53)
>     >     at
>     >     org.apache.cloudstack.managed.context.
> ManagedContextRunnable.run(
>     > ManagedContextRunnable.java:46)
>     >     at java.util.concurrent.Executors$RunnableAdapter.
>     > call(Executors.java:511)
>     >     at java.util.concurrent.FutureTask.runAndReset(
> FutureTask.java:308)
>     >     at
>     >     java.util.concurrent.ScheduledThreadPoolExecutor$
>     > ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>     >     at
>     >     java.util.concurrent.ScheduledThreadPoolExecutor$
>     > ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>     >     at
>     >     java.util.concurrent.ThreadPoolExecutor.runWorker(
>     > ThreadPoolExecutor.java:1149)
>     >     at
>     >     java.util.concurrent.ThreadPoolExecutor$Worker.run(
>     > ThreadPoolExecutor.java:624)
>     >     at java.lang.Thread.run(Thread.java:748)
>     >     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
> Disconnect to
>     >     listener:
>     >     com.cloud.network.NetworkUsageManagerImpl$
> DirectNetworkStatsListener
>     >     2018-02-21 15:57:47,527 DEBUG [c.c.n.NetworkUsageManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Disconnected
> called on
>     > 3
>     >     with status Disconnected
>     >     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
> Disconnect to
>     >     listener: com.cloud.agent.manager.AgentManagerImpl$
>     > BehindOnPingListener
>     >     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
> Disconnect to
>     >     listener: com.cloud.agent.manager.AgentManagerImpl$
>     > SetHostParamsListener
>     >     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
> Disconnect to
>     >     listener: com.cloud.capacity.StorageCapacityListener
>     >     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
> Disconnect to
>     >     listener: com.cloud.capacity.ComputeCapacityListener
>     >     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
> Disconnect to
>     >     listener: com.cloud.network.SshKeysDistriMonitor
>     >     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
> Disconnect to
>     >     listener: com.cloud.network.router.
> VpcVirtualNetworkApplianceMana
>     > gerImpl
>     >     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
> Disconnect to
>     >     listener: com.cloud.storage.LocalStoragePoolListener
>     >     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
> Disconnect to
>     >     listener: com.cloud.storage.upload.UploadListener
>     >     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
> Disconnect to
>     >     listener: com.cloud.storage.download.DownloadListener
>     >     2018-02-21 15:57:47,527 DEBUG [c.c.h.Status]
>     > (AgentTaskPool-7:ctx-67ec16e3)
>     >     (logid:d6a36e24) Transition:[Resource state = Enabled, Agent
> event =
>     >     ShutdownRequested, Host id = 3, name = s-1-VM]
>     >     2018-02-21 15:57:47,620 DEBUG [c.c.a.m.
> ClusteredAgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Notifying other
> nodes
>     > of to
>     >     disconnect
>     >     ----
>     >
>     >     When the issue arises, all instances, hosts, and other resources
> are
>     >     running fine. I just updated the cloudstack-management and
>     > cloudstack-agent
>     >     to to 4.11, but the problem is still there. Any ideas?
>     >
>     >
>     >     Thanks!
>     >
>     >     Chen
>     >
>     >
>     >
>     > Dag.Sonstebo@shapeblue.com
>     > www.shapeblue.com
>     > 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
>     > @shapeblue
>     >
>     >
>     >
>     >
>
>
>
> Dag.Sonstebo@shapeblue.com
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
>
>
>
>

Re: CPVM/SSVM agent state disconnects

Posted by Dag Sonstebo <Da...@shapeblue.com>.
Do VRs stay online and connected?

What you need to do next is check your cloud.log on the system VMs, possibly also up the verbosity level in the logs to catch why they are dropping comms.

Regards,
Dag Sonstebo
Cloud Architect
ShapeBlue

On 23/02/2018, 15:25, "Chen Zhang" <ia...@gmail.com> wrote:

    Hi Dag,
    
    Yes I did recreate the new system VMs. The version is "Cloudstack release
    4.11.0".
    
    Thanks!
    Chen
    
    On Fri, Feb 23, 2018 at 9:27 AM, Dag Sonstebo <Da...@shapeblue.com>
    wrote:
    
    > Hi Chen,
    >
    > You say you just upgraded to 4.11 – did you destroy your system VMs and
    > let them recreate after the upgrade?
    >
    > Can you also check what version a “cat /etc/cloudstack-release” shows up
    > with on your SSVM/CPVM?
    >
    > Regards,
    > Dag Sonstebo
    > Cloud Architect
    > ShapeBlue
    >
    > On 23/02/2018, 14:00, "Chen Zhang" <ia...@gmail.com> wrote:
    >
    >     Hello,
    >
    >
    >     I am new in the list and I am stuck with a very annoying issue on
    >     CPVM/SSVM.
    >
    >
    >     When I start the Cloudstack-management, everything is good. After
    > around 3-4
    >     <outlook-data-detector://0> hours, the agent state of CPVM/SSVM
    >     automatically turns to "Disconnected" and the secondary storage goes to
    >     "0kb/0kb", but the VM state is still "running". Once manually rebooting
    >     CPVM/SSVM, the agent state would turn back to "up" and the secondary
    >     storage would be back as well. After 3-4 hours, the issue repeats
    > again.
    >
    >
    >     Here is the log when SSVM/CPVM goes down:
    >
    >
    >     ----
    >     2018-02-21 15:57:47,517 INFO [c.c.a.m.AgentManagerImpl]
    >     (AgentMonitor-1:ctx-81471e1e) (logid:d0bdac05) Found the following
    > agents
    >     behind on ping: [3]
    >     2018-02-21 15:57:47,521 WARN [c.c.a.m.AgentManagerImpl]
    >     (AgentMonitor-1:ctx-81471e1e) (logid:d0bdac05) Disconnect agent for
    >     CPVM/SSVM due to physical connection close. host: 3
    >     2018-02-21 15:57:47,522 INFO [c.c.a.m.AgentManagerImpl]
    >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Host 3 is disconnecting
    >     with event ShutdownRequested
    >     2018-02-21 15:57:47,524 DEBUG [c.c.a.m.AgentManagerImpl]
    >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) The next status of
    > agent
    >     3is Disconnected, current status is Up
    >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Deregistering link for
    > 3
    >     with state Disconnected
    >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Remove Agent : 3
    >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.ConnectedAgentAttache]
    >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Processing Disconnect.
    >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentAttache]
    >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Seq
    > 3-906630899985023222:
    >     Sending disconnect to class com.cloud.agent.manager.
    > SynchronousListener
    >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    >     listener: com.cloud.hypervisor.xenserver.discoverer.
    > XcpServerDiscoverer
    >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    >     listener: com.cloud.hypervisor.hyperv.discoverer.
    > HypervServerDiscoverer
    >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    >     listener: com.cloud.storage.listener.StoragePoolMonitor
    >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    >     listener: org.apache.cloudstack.engine.orchestration.
    > NetworkOrchestrator
    >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    >     listener: com.cloud.storage.secondary.SecondaryStorageListener
    >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    >     listener: com.cloud.network.security.SecurityGroupListener
    >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentAttache]
    >     (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq
    > 3-906630899985023222:
    >     Waiting some more time because this is the current command
    >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    >     listener: com.cloud.deploy.DeploymentPlanningManagerImpl
    >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    >     listener: com.cloud.vm.ClusteredVirtualMachineManagerImpl
    >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    >     listener: com.cloud.network.SshKeysDistriMonitor
    >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    >     listener: com.cloud.network.router.VirtualNetworkApplianceManagerImpl
    >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    >     listener: com.cloud.consoleproxy.ConsoleProxyListener
    >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentAttache]
    >     (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq
    > 3-906630899985023222:
    >     Waiting some more time because this is the current command
    >     2018-02-21 15:57:47,526 INFO [c.c.u.e.CSExceptionErrorCode]
    >     (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Could not find
    > exception:
    >     com.cloud.exception.OperationTimedoutException in error code list for
    >     exceptions
    >     2018-02-21 15:57:47,526 WARN [c.c.a.m.AgentAttache]
    >     (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq
    > 3-906630899985023222:
    >     Timed out on null
    >     2018-02-21 15:57:47,526 DEBUG [c.c.a.m.AgentAttache]
    >     (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq
    > 3-906630899985023222:
    >     Cancelling.
    >     2018-02-21 15:57:47,526 DEBUG [o.a.c.s.RemoteHostEndPoint]
    >     (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Failed to send
    > command,
    >     due to Agent:3, com.cloud.exception.OperationTimedoutException:
    > Commands
    >     906630899985023222 to Host 3 timed out after 3600
    >     2018-02-21 15:57:47,526 ERROR [c.c.s.StatsCollector]
    >     (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Error trying to
    > retrieve
    >     storage stats
    >     com.cloud.utils.exception.CloudRuntimeException: Failed to send
    > command,
    >     due to Agent:3, com.cloud.exception.OperationTimedoutException:
    > Commands
    >     906630899985023222 to Host 3 timed out after 3600
    >     at
    >     org.apache.cloudstack.storage.RemoteHostEndPoint.sendMessage(
    > RemoteHostEndPoint.java:133)
    >     at
    >     com.cloud.server.StatsCollector$StorageCollector.runInContext(
    > StatsCollector.java:985)
    >     at
    >     org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(
    > ManagedContextRunnable.java:49)
    >     at
    >     org.apache.cloudstack.managed.context.impl.
    > DefaultManagedContext$1.call(DefaultManagedContext.java:56)
    >     at
    >     org.apache.cloudstack.managed.context.impl.DefaultManagedContext.
    > callWithContext(DefaultManagedContext.java:103)
    >     at
    >     org.apache.cloudstack.managed.context.impl.DefaultManagedContext.
    > runWithContext(DefaultManagedContext.java:53)
    >     at
    >     org.apache.cloudstack.managed.context.ManagedContextRunnable.run(
    > ManagedContextRunnable.java:46)
    >     at java.util.concurrent.Executors$RunnableAdapter.
    > call(Executors.java:511)
    >     at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    >     at
    >     java.util.concurrent.ScheduledThreadPoolExecutor$
    > ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    >     at
    >     java.util.concurrent.ScheduledThreadPoolExecutor$
    > ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    >     at
    >     java.util.concurrent.ThreadPoolExecutor.runWorker(
    > ThreadPoolExecutor.java:1149)
    >     at
    >     java.util.concurrent.ThreadPoolExecutor$Worker.run(
    > ThreadPoolExecutor.java:624)
    >     at java.lang.Thread.run(Thread.java:748)
    >     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
    >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    >     listener:
    >     com.cloud.network.NetworkUsageManagerImpl$DirectNetworkStatsListener
    >     2018-02-21 15:57:47,527 DEBUG [c.c.n.NetworkUsageManagerImpl]
    >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Disconnected called on
    > 3
    >     with status Disconnected
    >     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
    >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    >     listener: com.cloud.agent.manager.AgentManagerImpl$
    > BehindOnPingListener
    >     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
    >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    >     listener: com.cloud.agent.manager.AgentManagerImpl$
    > SetHostParamsListener
    >     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
    >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    >     listener: com.cloud.capacity.StorageCapacityListener
    >     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
    >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    >     listener: com.cloud.capacity.ComputeCapacityListener
    >     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
    >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    >     listener: com.cloud.network.SshKeysDistriMonitor
    >     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
    >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    >     listener: com.cloud.network.router.VpcVirtualNetworkApplianceMana
    > gerImpl
    >     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
    >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    >     listener: com.cloud.storage.LocalStoragePoolListener
    >     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
    >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    >     listener: com.cloud.storage.upload.UploadListener
    >     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
    >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    >     listener: com.cloud.storage.download.DownloadListener
    >     2018-02-21 15:57:47,527 DEBUG [c.c.h.Status]
    > (AgentTaskPool-7:ctx-67ec16e3)
    >     (logid:d6a36e24) Transition:[Resource state = Enabled, Agent event =
    >     ShutdownRequested, Host id = 3, name = s-1-VM]
    >     2018-02-21 15:57:47,620 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
    >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Notifying other nodes
    > of to
    >     disconnect
    >     ----
    >
    >     When the issue arises, all instances, hosts, and other resources are
    >     running fine. I just updated the cloudstack-management and
    > cloudstack-agent
    >     to to 4.11, but the problem is still there. Any ideas?
    >
    >
    >     Thanks!
    >
    >     Chen
    >
    >
    >
    > Dag.Sonstebo@shapeblue.com
    > www.shapeblue.com
    > 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
    > @shapeblue
    >
    >
    >
    >
    


Dag.Sonstebo@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 


Re: CPVM/SSVM agent state disconnects

Posted by Chen Zhang <ia...@gmail.com>.
Hi Dag,

Yes I did recreate the new system VMs. The version is "Cloudstack release
4.11.0".

Thanks!
Chen

On Fri, Feb 23, 2018 at 9:27 AM, Dag Sonstebo <Da...@shapeblue.com>
wrote:

> Hi Chen,
>
> You say you just upgraded to 4.11 – did you destroy your system VMs and
> let them recreate after the upgrade?
>
> Can you also check what version a “cat /etc/cloudstack-release” shows up
> with on your SSVM/CPVM?
>
> Regards,
> Dag Sonstebo
> Cloud Architect
> ShapeBlue
>
> On 23/02/2018, 14:00, "Chen Zhang" <ia...@gmail.com> wrote:
>
>     Hello,
>
>
>     I am new in the list and I am stuck with a very annoying issue on
>     CPVM/SSVM.
>
>
>     When I start the Cloudstack-management, everything is good. After
> around 3-4
>     <outlook-data-detector://0> hours, the agent state of CPVM/SSVM
>     automatically turns to "Disconnected" and the secondary storage goes to
>     "0kb/0kb", but the VM state is still "running". Once manually rebooting
>     CPVM/SSVM, the agent state would turn back to "up" and the secondary
>     storage would be back as well. After 3-4 hours, the issue repeats
> again.
>
>
>     Here is the log when SSVM/CPVM goes down:
>
>
>     ----
>     2018-02-21 15:57:47,517 INFO [c.c.a.m.AgentManagerImpl]
>     (AgentMonitor-1:ctx-81471e1e) (logid:d0bdac05) Found the following
> agents
>     behind on ping: [3]
>     2018-02-21 15:57:47,521 WARN [c.c.a.m.AgentManagerImpl]
>     (AgentMonitor-1:ctx-81471e1e) (logid:d0bdac05) Disconnect agent for
>     CPVM/SSVM due to physical connection close. host: 3
>     2018-02-21 15:57:47,522 INFO [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Host 3 is disconnecting
>     with event ShutdownRequested
>     2018-02-21 15:57:47,524 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) The next status of
> agent
>     3is Disconnected, current status is Up
>     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Deregistering link for
> 3
>     with state Disconnected
>     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Remove Agent : 3
>     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.ConnectedAgentAttache]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Processing Disconnect.
>     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentAttache]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Seq
> 3-906630899985023222:
>     Sending disconnect to class com.cloud.agent.manager.
> SynchronousListener
>     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
>     listener: com.cloud.hypervisor.xenserver.discoverer.
> XcpServerDiscoverer
>     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
>     listener: com.cloud.hypervisor.hyperv.discoverer.
> HypervServerDiscoverer
>     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
>     listener: com.cloud.storage.listener.StoragePoolMonitor
>     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
>     listener: org.apache.cloudstack.engine.orchestration.
> NetworkOrchestrator
>     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
>     listener: com.cloud.storage.secondary.SecondaryStorageListener
>     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
>     listener: com.cloud.network.security.SecurityGroupListener
>     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentAttache]
>     (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq
> 3-906630899985023222:
>     Waiting some more time because this is the current command
>     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
>     listener: com.cloud.deploy.DeploymentPlanningManagerImpl
>     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
>     listener: com.cloud.vm.ClusteredVirtualMachineManagerImpl
>     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
>     listener: com.cloud.network.SshKeysDistriMonitor
>     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
>     listener: com.cloud.network.router.VirtualNetworkApplianceManagerImpl
>     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
>     listener: com.cloud.consoleproxy.ConsoleProxyListener
>     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentAttache]
>     (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq
> 3-906630899985023222:
>     Waiting some more time because this is the current command
>     2018-02-21 15:57:47,526 INFO [c.c.u.e.CSExceptionErrorCode]
>     (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Could not find
> exception:
>     com.cloud.exception.OperationTimedoutException in error code list for
>     exceptions
>     2018-02-21 15:57:47,526 WARN [c.c.a.m.AgentAttache]
>     (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq
> 3-906630899985023222:
>     Timed out on null
>     2018-02-21 15:57:47,526 DEBUG [c.c.a.m.AgentAttache]
>     (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq
> 3-906630899985023222:
>     Cancelling.
>     2018-02-21 15:57:47,526 DEBUG [o.a.c.s.RemoteHostEndPoint]
>     (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Failed to send
> command,
>     due to Agent:3, com.cloud.exception.OperationTimedoutException:
> Commands
>     906630899985023222 to Host 3 timed out after 3600
>     2018-02-21 15:57:47,526 ERROR [c.c.s.StatsCollector]
>     (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Error trying to
> retrieve
>     storage stats
>     com.cloud.utils.exception.CloudRuntimeException: Failed to send
> command,
>     due to Agent:3, com.cloud.exception.OperationTimedoutException:
> Commands
>     906630899985023222 to Host 3 timed out after 3600
>     at
>     org.apache.cloudstack.storage.RemoteHostEndPoint.sendMessage(
> RemoteHostEndPoint.java:133)
>     at
>     com.cloud.server.StatsCollector$StorageCollector.runInContext(
> StatsCollector.java:985)
>     at
>     org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(
> ManagedContextRunnable.java:49)
>     at
>     org.apache.cloudstack.managed.context.impl.
> DefaultManagedContext$1.call(DefaultManagedContext.java:56)
>     at
>     org.apache.cloudstack.managed.context.impl.DefaultManagedContext.
> callWithContext(DefaultManagedContext.java:103)
>     at
>     org.apache.cloudstack.managed.context.impl.DefaultManagedContext.
> runWithContext(DefaultManagedContext.java:53)
>     at
>     org.apache.cloudstack.managed.context.ManagedContextRunnable.run(
> ManagedContextRunnable.java:46)
>     at java.util.concurrent.Executors$RunnableAdapter.
> call(Executors.java:511)
>     at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>     at
>     java.util.concurrent.ScheduledThreadPoolExecutor$
> ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>     at
>     java.util.concurrent.ScheduledThreadPoolExecutor$
> ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>     at
>     java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1149)
>     at
>     java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)
>     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
>     listener:
>     com.cloud.network.NetworkUsageManagerImpl$DirectNetworkStatsListener
>     2018-02-21 15:57:47,527 DEBUG [c.c.n.NetworkUsageManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Disconnected called on
> 3
>     with status Disconnected
>     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
>     listener: com.cloud.agent.manager.AgentManagerImpl$
> BehindOnPingListener
>     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
>     listener: com.cloud.agent.manager.AgentManagerImpl$
> SetHostParamsListener
>     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
>     listener: com.cloud.capacity.StorageCapacityListener
>     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
>     listener: com.cloud.capacity.ComputeCapacityListener
>     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
>     listener: com.cloud.network.SshKeysDistriMonitor
>     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
>     listener: com.cloud.network.router.VpcVirtualNetworkApplianceMana
> gerImpl
>     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
>     listener: com.cloud.storage.LocalStoragePoolListener
>     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
>     listener: com.cloud.storage.upload.UploadListener
>     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
>     listener: com.cloud.storage.download.DownloadListener
>     2018-02-21 15:57:47,527 DEBUG [c.c.h.Status]
> (AgentTaskPool-7:ctx-67ec16e3)
>     (logid:d6a36e24) Transition:[Resource state = Enabled, Agent event =
>     ShutdownRequested, Host id = 3, name = s-1-VM]
>     2018-02-21 15:57:47,620 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Notifying other nodes
> of to
>     disconnect
>     ----
>
>     When the issue arises, all instances, hosts, and other resources are
>     running fine. I just updated the cloudstack-management and
> cloudstack-agent
>     to to 4.11, but the problem is still there. Any ideas?
>
>
>     Thanks!
>
>     Chen
>
>
>
> Dag.Sonstebo@shapeblue.com
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
>
>
>
>

Re: CPVM/SSVM agent state disconnects

Posted by Dag Sonstebo <Da...@shapeblue.com>.
Hi Chen,

You say you just upgraded to 4.11 – did you destroy your system VMs and let them recreate after the upgrade?

Can you also check what version a “cat /etc/cloudstack-release” shows up with on your SSVM/CPVM?

Regards,
Dag Sonstebo
Cloud Architect
ShapeBlue

On 23/02/2018, 14:00, "Chen Zhang" <ia...@gmail.com> wrote:

    Hello,
    
    
    I am new in the list and I am stuck with a very annoying issue on
    CPVM/SSVM.
    
    
    When I start the Cloudstack-management, everything is good. After around 3-4
    <outlook-data-detector://0> hours, the agent state of CPVM/SSVM
    automatically turns to "Disconnected" and the secondary storage goes to
    "0kb/0kb", but the VM state is still "running". Once manually rebooting
    CPVM/SSVM, the agent state would turn back to "up" and the secondary
    storage would be back as well. After 3-4 hours, the issue repeats again.
    
    
    Here is the log when SSVM/CPVM goes down:
    
    
    ----
    2018-02-21 15:57:47,517 INFO [c.c.a.m.AgentManagerImpl]
    (AgentMonitor-1:ctx-81471e1e) (logid:d0bdac05) Found the following agents
    behind on ping: [3]
    2018-02-21 15:57:47,521 WARN [c.c.a.m.AgentManagerImpl]
    (AgentMonitor-1:ctx-81471e1e) (logid:d0bdac05) Disconnect agent for
    CPVM/SSVM due to physical connection close. host: 3
    2018-02-21 15:57:47,522 INFO [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Host 3 is disconnecting
    with event ShutdownRequested
    2018-02-21 15:57:47,524 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) The next status of agent
    3is Disconnected, current status is Up
    2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Deregistering link for 3
    with state Disconnected
    2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Remove Agent : 3
    2018-02-21 15:57:47,525 DEBUG [c.c.a.m.ConnectedAgentAttache]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Processing Disconnect.
    2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentAttache]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Seq 3-906630899985023222:
    Sending disconnect to class com.cloud.agent.manager.SynchronousListener
    2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    listener: com.cloud.hypervisor.xenserver.discoverer.XcpServerDiscoverer
    2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    listener: com.cloud.hypervisor.hyperv.discoverer.HypervServerDiscoverer
    2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    listener: com.cloud.storage.listener.StoragePoolMonitor
    2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    listener: org.apache.cloudstack.engine.orchestration.NetworkOrchestrator
    2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    listener: com.cloud.storage.secondary.SecondaryStorageListener
    2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    listener: com.cloud.network.security.SecurityGroupListener
    2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentAttache]
    (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq 3-906630899985023222:
    Waiting some more time because this is the current command
    2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    listener: com.cloud.deploy.DeploymentPlanningManagerImpl
    2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    listener: com.cloud.vm.ClusteredVirtualMachineManagerImpl
    2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    listener: com.cloud.network.SshKeysDistriMonitor
    2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    listener: com.cloud.network.router.VirtualNetworkApplianceManagerImpl
    2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    listener: com.cloud.consoleproxy.ConsoleProxyListener
    2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentAttache]
    (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq 3-906630899985023222:
    Waiting some more time because this is the current command
    2018-02-21 15:57:47,526 INFO [c.c.u.e.CSExceptionErrorCode]
    (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Could not find exception:
    com.cloud.exception.OperationTimedoutException in error code list for
    exceptions
    2018-02-21 15:57:47,526 WARN [c.c.a.m.AgentAttache]
    (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq 3-906630899985023222:
    Timed out on null
    2018-02-21 15:57:47,526 DEBUG [c.c.a.m.AgentAttache]
    (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq 3-906630899985023222:
    Cancelling.
    2018-02-21 15:57:47,526 DEBUG [o.a.c.s.RemoteHostEndPoint]
    (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Failed to send command,
    due to Agent:3, com.cloud.exception.OperationTimedoutException: Commands
    906630899985023222 to Host 3 timed out after 3600
    2018-02-21 15:57:47,526 ERROR [c.c.s.StatsCollector]
    (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Error trying to retrieve
    storage stats
    com.cloud.utils.exception.CloudRuntimeException: Failed to send command,
    due to Agent:3, com.cloud.exception.OperationTimedoutException: Commands
    906630899985023222 to Host 3 timed out after 3600
    at
    org.apache.cloudstack.storage.RemoteHostEndPoint.sendMessage(RemoteHostEndPoint.java:133)
    at
    com.cloud.server.StatsCollector$StorageCollector.runInContext(StatsCollector.java:985)
    at
    org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
    at
    org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
    at
    org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
    at
    org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
    at
    org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at
    java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at
    java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at
    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
    2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    listener:
    com.cloud.network.NetworkUsageManagerImpl$DirectNetworkStatsListener
    2018-02-21 15:57:47,527 DEBUG [c.c.n.NetworkUsageManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Disconnected called on 3
    with status Disconnected
    2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    listener: com.cloud.agent.manager.AgentManagerImpl$BehindOnPingListener
    2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    listener: com.cloud.agent.manager.AgentManagerImpl$SetHostParamsListener
    2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    listener: com.cloud.capacity.StorageCapacityListener
    2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    listener: com.cloud.capacity.ComputeCapacityListener
    2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    listener: com.cloud.network.SshKeysDistriMonitor
    2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    listener: com.cloud.network.router.VpcVirtualNetworkApplianceManagerImpl
    2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    listener: com.cloud.storage.LocalStoragePoolListener
    2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    listener: com.cloud.storage.upload.UploadListener
    2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    listener: com.cloud.storage.download.DownloadListener
    2018-02-21 15:57:47,527 DEBUG [c.c.h.Status] (AgentTaskPool-7:ctx-67ec16e3)
    (logid:d6a36e24) Transition:[Resource state = Enabled, Agent event =
    ShutdownRequested, Host id = 3, name = s-1-VM]
    2018-02-21 15:57:47,620 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Notifying other nodes of to
    disconnect
    ----
    
    When the issue arises, all instances, hosts, and other resources are
    running fine. I just updated the cloudstack-management and cloudstack-agent
    to to 4.11, but the problem is still there. Any ideas?
    
    
    Thanks!
    
    Chen
    


Dag.Sonstebo@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue