You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cloudstack.apache.org by Ugo Vasi <ug...@procne.it> on 2015/01/26 16:05:05 UTC
agent host in alert state
Hi all,
we have installed a cloudstack 4.3.0 in advanced network mode on ubuntu
systems with only kvm hypervisor.
Today we received these series of notification (email):
1) Host disconnected, name: agent_name (id:7), availability zone:
zone_name, pod: pod_name
2) Host is down, name: agent_name (id:7), availability zone:
zone_name, pod: pod_name
3) Host disconnected, name: agent_name (id:7), availability zone:
zone_name, pod: pod_name
4) Host is down, name: agent_name (id:7), availability zone:
zone_name, pod: pod_name
5) Unable to restart vm_name which was running on host name:
agent_name(id:7), availability zone: zone_name, pod: pod_name
The server agent was not shut down nor rebooted and the virtual machines
are still running.
In agent log I found messages like these:
2015-01-26 15:04:40,728 INFO [cloud.agent.Agent] (Agent-Handler-2:null)
Cannot connect because we still have 5 commands in progress.
2015-01-26 15:04:45,729 INFO [cloud.agent.Agent] (Agent-Handler-2:null)
Lost connection to the server. Dealing with the remaining commands...
2015-01-26 15:04:45,729 INFO [cloud.agent.Agent] (Agent-Handler-2:null)
Cannot connect because we still have 5 commands in progress.
2015-01-26 15:04:50,729 INFO [cloud.agent.Agent] (Agent-Handler-2:null)
Lost connection to the server. Dealing with the remaining commands...
I tried to restart the agent service and after some minutes the log says:
2015-01-26 15:05:42,207 INFO [utils.nio.NioClient]
(Agent-Selector:null) Connecting to manager_ip:8250
2015-01-26 15:05:42,489 INFO [utils.nio.NioClient]
(Agent-Selector:null) SSL: Handshake done
2015-01-26 15:05:42,489 INFO [utils.nio.NioClient]
(Agent-Selector:null) Connected to manager_ip:8250
But in manager interface I see this agent in Alert state.
Any idea to resolve this problem?
--
U g o V a s i <ug...@procne.it>
P r o c n e s.r.l >)
via Cotonificio 45 33010 Tavagnacco IT
phone: +390432486523 fax: +390432486523
Le informazioni contenute in questo messaggio sono riservate e
confidenziali ed è vietata la diffusione in qualunque modo eseguita.
Qualora Lei non fosse la persona a cui il presente messaggio è
destinato, La invitiamo ad eliminarlo e a non leggerlo, dandocene
gentilmente comunicazione.
Per qualsiasi informazione si prega di contattare support@procne.it .
Rif. D.L. 196/2003
Re: agent host in alert state
Posted by Ugo Vasi <ug...@procne.it>.
We found that secondary storage nfs was not working well. We then
restore the nfs service and rebooting the machine on alert. Now it works!
Il 26/01/2015 16:32, Somesh Naidu ha scritto:
> From the logs it appears that agent got connected but can't say what happened next. Need further logs.
>
> There are quite a few things that you could verify/check, like,
> 1. netstat shows a connection between mgmt. server (on port 8250) and systemvm.
> 2. the disk on the systemvm hasn't run out of space.
>
> You could perform a stop/start of the VM to see if that recovers from the situation.
>
> You may also try various other checks including running the health check script mentioned here, https://cwiki.apache.org/confluence/display/CLOUDSTACK/SSVM,+templates,+Secondary+storage+troubleshooting.
>
> Regards,
> Somesh
>
> -----Original Message-----
> From: Ugo Vasi [mailto:ugo.vasi@procne.it]
> Sent: Monday, January 26, 2015 10:05 AM
> To: users@cloudstack.apache.org
> Subject: agent host in alert state
>
> Hi all,
> we have installed a cloudstack 4.3.0 in advanced network mode on ubuntu
> systems with only kvm hypervisor.
>
> Today we received these series of notification (email):
> 1) Host disconnected, name: agent_name (id:7), availability zone:
> zone_name, pod: pod_name
> 2) Host is down, name: agent_name (id:7), availability zone:
> zone_name, pod: pod_name
> 3) Host disconnected, name: agent_name (id:7), availability zone:
> zone_name, pod: pod_name
> 4) Host is down, name: agent_name (id:7), availability zone:
> zone_name, pod: pod_name
> 5) Unable to restart vm_name which was running on host name:
> agent_name(id:7), availability zone: zone_name, pod: pod_name
>
> The server agent was not shut down nor rebooted and the virtual machines
> are still running.
>
> In agent log I found messages like these:
>
> 2015-01-26 15:04:40,728 INFO [cloud.agent.Agent] (Agent-Handler-2:null)
> Cannot connect because we still have 5 commands in progress.
> 2015-01-26 15:04:45,729 INFO [cloud.agent.Agent] (Agent-Handler-2:null)
> Lost connection to the server. Dealing with the remaining commands...
> 2015-01-26 15:04:45,729 INFO [cloud.agent.Agent] (Agent-Handler-2:null)
> Cannot connect because we still have 5 commands in progress.
> 2015-01-26 15:04:50,729 INFO [cloud.agent.Agent] (Agent-Handler-2:null)
> Lost connection to the server. Dealing with the remaining commands...
>
>
> I tried to restart the agent service and after some minutes the log says:
>
> 2015-01-26 15:05:42,207 INFO [utils.nio.NioClient]
> (Agent-Selector:null) Connecting to manager_ip:8250
> 2015-01-26 15:05:42,489 INFO [utils.nio.NioClient]
> (Agent-Selector:null) SSL: Handshake done
> 2015-01-26 15:05:42,489 INFO [utils.nio.NioClient]
> (Agent-Selector:null) Connected to manager_ip:8250
>
> But in manager interface I see this agent in Alert state.
>
> Any idea to resolve this problem?
>
>
--
U g o V a s i <ug...@procne.it>
P r o c n e s.r.l >)
via Cotonificio 45 33010 Tavagnacco IT
phone: +390432486523 fax: +390432486523
Le informazioni contenute in questo messaggio sono riservate e
confidenziali ed è vietata la diffusione in qualunque modo eseguita.
Qualora Lei non fosse la persona a cui il presente messaggio è
destinato, La invitiamo ad eliminarlo e a non leggerlo, dandocene
gentilmente comunicazione.
Per qualsiasi informazione si prega di contattare support@procne.it .
Rif. D.L. 196/2003
RE: agent host in alert state
Posted by Somesh Naidu <So...@citrix.com>.
From the logs it appears that agent got connected but can't say what happened next. Need further logs.
There are quite a few things that you could verify/check, like,
1. netstat shows a connection between mgmt. server (on port 8250) and systemvm.
2. the disk on the systemvm hasn't run out of space.
You could perform a stop/start of the VM to see if that recovers from the situation.
You may also try various other checks including running the health check script mentioned here, https://cwiki.apache.org/confluence/display/CLOUDSTACK/SSVM,+templates,+Secondary+storage+troubleshooting.
Regards,
Somesh
-----Original Message-----
From: Ugo Vasi [mailto:ugo.vasi@procne.it]
Sent: Monday, January 26, 2015 10:05 AM
To: users@cloudstack.apache.org
Subject: agent host in alert state
Hi all,
we have installed a cloudstack 4.3.0 in advanced network mode on ubuntu
systems with only kvm hypervisor.
Today we received these series of notification (email):
1) Host disconnected, name: agent_name (id:7), availability zone:
zone_name, pod: pod_name
2) Host is down, name: agent_name (id:7), availability zone:
zone_name, pod: pod_name
3) Host disconnected, name: agent_name (id:7), availability zone:
zone_name, pod: pod_name
4) Host is down, name: agent_name (id:7), availability zone:
zone_name, pod: pod_name
5) Unable to restart vm_name which was running on host name:
agent_name(id:7), availability zone: zone_name, pod: pod_name
The server agent was not shut down nor rebooted and the virtual machines
are still running.
In agent log I found messages like these:
2015-01-26 15:04:40,728 INFO [cloud.agent.Agent] (Agent-Handler-2:null)
Cannot connect because we still have 5 commands in progress.
2015-01-26 15:04:45,729 INFO [cloud.agent.Agent] (Agent-Handler-2:null)
Lost connection to the server. Dealing with the remaining commands...
2015-01-26 15:04:45,729 INFO [cloud.agent.Agent] (Agent-Handler-2:null)
Cannot connect because we still have 5 commands in progress.
2015-01-26 15:04:50,729 INFO [cloud.agent.Agent] (Agent-Handler-2:null)
Lost connection to the server. Dealing with the remaining commands...
I tried to restart the agent service and after some minutes the log says:
2015-01-26 15:05:42,207 INFO [utils.nio.NioClient]
(Agent-Selector:null) Connecting to manager_ip:8250
2015-01-26 15:05:42,489 INFO [utils.nio.NioClient]
(Agent-Selector:null) SSL: Handshake done
2015-01-26 15:05:42,489 INFO [utils.nio.NioClient]
(Agent-Selector:null) Connected to manager_ip:8250
But in manager interface I see this agent in Alert state.
Any idea to resolve this problem?
--
U g o V a s i <ug...@procne.it>
P r o c n e s.r.l >)
via Cotonificio 45 33010 Tavagnacco IT
phone: +390432486523 fax: +390432486523
Le informazioni contenute in questo messaggio sono riservate e
confidenziali ed è vietata la diffusione in qualunque modo eseguita.
Qualora Lei non fosse la persona a cui il presente messaggio è
destinato, La invitiamo ad eliminarlo e a non leggerlo, dandocene
gentilmente comunicazione.
Per qualsiasi informazione si prega di contattare support@procne.it .
Rif. D.L. 196/2003