You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cloudstack.apache.org by Ugo Vasi <ug...@procne.it> on 2015/01/26 16:05:05 UTC

agent host in alert state

Hi all,
we have installed a cloudstack 4.3.0 in advanced network mode on ubuntu 
systems with only kvm hypervisor.

Today we received these series of notification (email):
  1) Host disconnected, name: agent_name (id:7), availability zone: 
zone_name, pod: pod_name
  2) Host is down, name: agent_name (id:7), availability zone: 
zone_name, pod: pod_name
  3) Host disconnected, name: agent_name (id:7), availability zone: 
zone_name, pod: pod_name
  4) Host is down, name: agent_name (id:7), availability zone: 
zone_name, pod: pod_name
  5) Unable to restart vm_name which was running on host name: 
agent_name(id:7), availability zone: zone_name, pod: pod_name

The server agent was not shut down nor rebooted and the virtual machines 
are still running.

In agent log I found messages like these:

2015-01-26 15:04:40,728 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) 
Cannot connect because we still have 5 commands in progress.
2015-01-26 15:04:45,729 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) 
Lost connection to the server. Dealing with the remaining commands...
2015-01-26 15:04:45,729 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) 
Cannot connect because we still have 5 commands in progress.
2015-01-26 15:04:50,729 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) 
Lost connection to the server. Dealing with the remaining commands...


I tried to restart the agent service and after some minutes the log says:

2015-01-26 15:05:42,207 INFO  [utils.nio.NioClient] 
(Agent-Selector:null) Connecting to manager_ip:8250
2015-01-26 15:05:42,489 INFO  [utils.nio.NioClient] 
(Agent-Selector:null) SSL: Handshake done
2015-01-26 15:05:42,489 INFO  [utils.nio.NioClient] 
(Agent-Selector:null) Connected to manager_ip:8250

But in manager interface I see this agent in Alert state.

Any idea to resolve this problem?


-- 

   U g o   V a s i    <ug...@procne.it>
   P r o c n e  s.r.l    >)
   via Cotonificio 45  33010 Tavagnacco IT
   phone: +390432486523 fax: +390432486523

Le informazioni contenute in questo messaggio sono riservate e
confidenziali ed è vietata la diffusione in qualunque modo eseguita.
Qualora Lei non fosse la persona a cui il presente messaggio è
destinato, La invitiamo ad eliminarlo e a non leggerlo, dandocene
gentilmente comunicazione.
Per qualsiasi informazione si prega di contattare support@procne.it .
Rif. D.L. 196/2003





Re: agent host in alert state

Posted by Ugo Vasi <ug...@procne.it>.
We found that secondary storage nfs was not working well. We then 
restore the nfs service and rebooting the machine on alert. Now it works!

Il 26/01/2015 16:32, Somesh Naidu ha scritto:
>  From the logs it appears that agent got connected but can't say what happened next. Need further logs.
>
> There are quite a few things that you could verify/check, like,
> 1. netstat shows a connection between mgmt. server (on port 8250) and systemvm.
> 2. the disk on the systemvm hasn't run out of space.
>
> You could perform a stop/start of the VM to see if that recovers from the situation.
>
> You may also try various other checks including running the health check script mentioned here, https://cwiki.apache.org/confluence/display/CLOUDSTACK/SSVM,+templates,+Secondary+storage+troubleshooting.
>
> Regards,
> Somesh
>
> -----Original Message-----
> From: Ugo Vasi [mailto:ugo.vasi@procne.it]
> Sent: Monday, January 26, 2015 10:05 AM
> To: users@cloudstack.apache.org
> Subject: agent host in alert state
>
> Hi all,
> we have installed a cloudstack 4.3.0 in advanced network mode on ubuntu
> systems with only kvm hypervisor.
>
> Today we received these series of notification (email):
>    1) Host disconnected, name: agent_name (id:7), availability zone:
> zone_name, pod: pod_name
>    2) Host is down, name: agent_name (id:7), availability zone:
> zone_name, pod: pod_name
>    3) Host disconnected, name: agent_name (id:7), availability zone:
> zone_name, pod: pod_name
>    4) Host is down, name: agent_name (id:7), availability zone:
> zone_name, pod: pod_name
>    5) Unable to restart vm_name which was running on host name:
> agent_name(id:7), availability zone: zone_name, pod: pod_name
>
> The server agent was not shut down nor rebooted and the virtual machines
> are still running.
>
> In agent log I found messages like these:
>
> 2015-01-26 15:04:40,728 INFO  [cloud.agent.Agent] (Agent-Handler-2:null)
> Cannot connect because we still have 5 commands in progress.
> 2015-01-26 15:04:45,729 INFO  [cloud.agent.Agent] (Agent-Handler-2:null)
> Lost connection to the server. Dealing with the remaining commands...
> 2015-01-26 15:04:45,729 INFO  [cloud.agent.Agent] (Agent-Handler-2:null)
> Cannot connect because we still have 5 commands in progress.
> 2015-01-26 15:04:50,729 INFO  [cloud.agent.Agent] (Agent-Handler-2:null)
> Lost connection to the server. Dealing with the remaining commands...
>
>
> I tried to restart the agent service and after some minutes the log says:
>
> 2015-01-26 15:05:42,207 INFO  [utils.nio.NioClient]
> (Agent-Selector:null) Connecting to manager_ip:8250
> 2015-01-26 15:05:42,489 INFO  [utils.nio.NioClient]
> (Agent-Selector:null) SSL: Handshake done
> 2015-01-26 15:05:42,489 INFO  [utils.nio.NioClient]
> (Agent-Selector:null) Connected to manager_ip:8250
>
> But in manager interface I see this agent in Alert state.
>
> Any idea to resolve this problem?
>
>


-- 

   U g o   V a s i    <ug...@procne.it>
   P r o c n e  s.r.l    >)
   via Cotonificio 45  33010 Tavagnacco IT
   phone: +390432486523 fax: +390432486523

Le informazioni contenute in questo messaggio sono riservate e
confidenziali ed è vietata la diffusione in qualunque modo eseguita.
Qualora Lei non fosse la persona a cui il presente messaggio è
destinato, La invitiamo ad eliminarlo e a non leggerlo, dandocene
gentilmente comunicazione.
Per qualsiasi informazione si prega di contattare support@procne.it .
Rif. D.L. 196/2003





RE: agent host in alert state

Posted by Somesh Naidu <So...@citrix.com>.
From the logs it appears that agent got connected but can't say what happened next. Need further logs.

There are quite a few things that you could verify/check, like,
1. netstat shows a connection between mgmt. server (on port 8250) and systemvm. 
2. the disk on the systemvm hasn't run out of space.

You could perform a stop/start of the VM to see if that recovers from the situation.

You may also try various other checks including running the health check script mentioned here, https://cwiki.apache.org/confluence/display/CLOUDSTACK/SSVM,+templates,+Secondary+storage+troubleshooting.

Regards,
Somesh

-----Original Message-----
From: Ugo Vasi [mailto:ugo.vasi@procne.it] 
Sent: Monday, January 26, 2015 10:05 AM
To: users@cloudstack.apache.org
Subject: agent host in alert state

Hi all,
we have installed a cloudstack 4.3.0 in advanced network mode on ubuntu 
systems with only kvm hypervisor.

Today we received these series of notification (email):
  1) Host disconnected, name: agent_name (id:7), availability zone: 
zone_name, pod: pod_name
  2) Host is down, name: agent_name (id:7), availability zone: 
zone_name, pod: pod_name
  3) Host disconnected, name: agent_name (id:7), availability zone: 
zone_name, pod: pod_name
  4) Host is down, name: agent_name (id:7), availability zone: 
zone_name, pod: pod_name
  5) Unable to restart vm_name which was running on host name: 
agent_name(id:7), availability zone: zone_name, pod: pod_name

The server agent was not shut down nor rebooted and the virtual machines 
are still running.

In agent log I found messages like these:

2015-01-26 15:04:40,728 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) 
Cannot connect because we still have 5 commands in progress.
2015-01-26 15:04:45,729 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) 
Lost connection to the server. Dealing with the remaining commands...
2015-01-26 15:04:45,729 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) 
Cannot connect because we still have 5 commands in progress.
2015-01-26 15:04:50,729 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) 
Lost connection to the server. Dealing with the remaining commands...


I tried to restart the agent service and after some minutes the log says:

2015-01-26 15:05:42,207 INFO  [utils.nio.NioClient] 
(Agent-Selector:null) Connecting to manager_ip:8250
2015-01-26 15:05:42,489 INFO  [utils.nio.NioClient] 
(Agent-Selector:null) SSL: Handshake done
2015-01-26 15:05:42,489 INFO  [utils.nio.NioClient] 
(Agent-Selector:null) Connected to manager_ip:8250

But in manager interface I see this agent in Alert state.

Any idea to resolve this problem?


-- 

   U g o   V a s i    <ug...@procne.it>
   P r o c n e  s.r.l    >)
   via Cotonificio 45  33010 Tavagnacco IT
   phone: +390432486523 fax: +390432486523

Le informazioni contenute in questo messaggio sono riservate e
confidenziali ed è vietata la diffusione in qualunque modo eseguita.
Qualora Lei non fosse la persona a cui il presente messaggio è
destinato, La invitiamo ad eliminarlo e a non leggerlo, dandocene
gentilmente comunicazione.
Per qualsiasi informazione si prega di contattare support@procne.it .
Rif. D.L. 196/2003