You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@cloudstack.apache.org by "Alena Prokharchyk (JIRA)" <ji...@apache.org> on 2014/05/02 22:48:18 UTC
[jira] [Commented] (CLOUDSTACK-6475) [Automation] communication
between cloudstack agent and MS disconnecting continuously
[ https://issues.apache.org/jira/browse/CLOUDSTACK-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988216#comment-13988216 ]
Alena Prokharchyk commented on CLOUDSTACK-6475:
-----------------------------------------------
Update from Kelven:
I dug into the problem further, it is caused by a long-locked transaction in VpcManagerImpl.java, destroyVpc() has external calls to agent and it has kept a transaction to open. When agent reconnect back to management server, it will get TIME-WAIT exception there
Kelven
protected class VpcCleanupTask extends ManagedContextRunnable {
@Override
protected void runInContext() {
try {
GlobalLock lock = GlobalLock.getInternLock("VpcCleanup");
if (lock == null) {
s_logger.debug("Couldn't get the global lock");
return;
}
if (!lock.lock(30)) {
s_logger.debug("Couldn't lock the db");
return;
}
try {
Transaction.execute(new TransactionCallbackWithExceptionNoReturn<Exception>() {
@Override
public void doInTransactionWithoutResult(TransactionStatus status) throws Exception {
// Cleanup inactive VPCs
List<VpcVO> inactiveVpcs = _vpcDao.listInactiveVpcs();
s_logger.info("Found " + inactiveVpcs.size() + " removed VPCs to cleanup");
for (VpcVO vpc : inactiveVpcs) {
s_logger.debug("Cleaning up " + vpc);
destroyVpc(vpc, _accountMgr.getAccount(Account.ACCOUNT_ID_SYSTEM), User.UID_SYSTEM);
}
}
});
} catch (Exception e) {
s_logger.error("Exception ", e);
} finally {
lock.unlock();
}
} catch (Exception e) {
s_logger.error("Exception ", e);
}
}
}
> [Automation] communication between cloudstack agent and MS disconnecting continuously
> ---------------------------------------------------------------------------------------
>
> Key: CLOUDSTACK-6475
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-6475
> Project: CloudStack
> Issue Type: Bug
> Security Level: Public(Anyone can view this level - this is the default.)
> Components: KVM
> Affects Versions: 4.4.0
> Environment: RHEL 6.3
> Reporter: Rayees Namathponnan
> Assignee: Alena Prokharchyk
> Fix For: 4.4.0
>
> Attachments: Agent_log.rar, management-server.rar
>
>
> This issue is observed during automation, run.
> communication between cloudstack agent and ms getting disconnected continuously; observed below error in agent log
> 2014-04-22 04:08:47,867 INFO [cloud.agent.Agent] (AgentShutdownThread:null) Stopping the agent: Reason = sig.kill
> 2014-04-22 04:10:41,456 INFO [cloud.agent.AgentShell] (main:null) Agent started
> 2014-04-22 04:10:41,500 INFO [cloud.agent.AgentShell] (main:null) Implementation Version is 4.4.0-SNAPSHOT
> 2014-04-22 04:10:41,502 INFO [cloud.agent.AgentShell] (main:null) agent.properties found at /etc/cloudstack/agent/agent.properties
> 2014-04-22 04:10:41,551 INFO [cloud.agent.AgentShell] (main:null) Defaulting to using properties file for storage
> 2014-04-22 04:10:41,552 INFO [cloud.agent.AgentShell] (main:null) Defaulting to the constant time backoff algorithm
> 2014-04-22 04:10:41,572 INFO [cloud.utils.LogUtils] (main:null) log4j configuration found at /etc/cloudstack/agent/log4j-cloud.xml
> 2014-04-22 04:10:41,722 INFO [cloud.agent.Agent] (main:null) id is 0
> 2014-04-22 04:10:42,501 INFO [kvm.resource.LibvirtComputingResource] (main:null) No libvirt.vif.driver specified. Defaults to BridgeVifDriver.
> 2014-04-22 04:10:42,590 INFO [cloud.agent.Agent] (main:null) Agent [id = 0 : type = LibvirtComputingResource : zone = 1 : pod = 1 : workers = 5 : host = 10.223.49.195 : port = 8250
> 2014-04-22 04:10:42,664 INFO [utils.nio.NioClient] (Agent-Selector:null) Connecting to 10.223.49.195:8250
> 2014-04-22 04:10:42,920 INFO [utils.nio.NioClient] (Agent-Selector:null) SSL: Handshake done
> 2014-04-22 04:10:42,920 INFO [utils.nio.NioClient] (Agent-Selector:null) Connected to 10.223.49.195:8250
> 2014-04-22 04:10:42,941 WARN [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:null) Could not read cpuinfo_max_freq
> 2014-04-22 04:10:43,158 INFO [cloud.serializer.GsonHelper] (Agent-Handler-1:null) Default Builder inited.
> 2014-04-22 04:10:43,227 INFO [cloud.agent.Agent] (Agent-Handler-2:null) Proccess agent startup answer, agent id = 0
> 2014-04-22 04:10:43,227 INFO [cloud.agent.Agent] (Agent-Handler-2:null) Set agent id 0
> 2014-04-22 04:10:43,233 INFO [cloud.agent.Agent] (Agent-Handler-2:null) Startup Response Received: agent id = 0
> 2014-04-22 04:11:40,925 INFO [cloud.agent.Agent] (Agent-Handler-1:null) Lost connection to the server. Dealing with the remaining commands...
> 2014-04-22 04:11:42,352 WARN [kvm.resource.KVMHAMonitor] (Thread-4:null) write heartbeat failed: /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh: line 131: echo: write error: Input/output error, retry: 0
> 2014-04-22 04:11:42,368 WARN [kvm.resource.KVMHAMonitor] (Thread-4:null) write heartbeat failed: /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh: line 131: echo: write error: Input/output error, retry: 1
> 2014-04-22 04:11:42,383 WARN [kvm.resource.KVMHAMonitor] (Thread-4:null) write heartbeat failed: /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh: line 131: echo: write error: Input/output error, retry: 2
> 2014-04-22 04:11:42,398 WARN [kvm.resource.KVMHAMonitor] (Thread-4:null) write heartbeat failed: /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh: line 131: echo: write error: Input/output error, retry: 3
> 2014-04-22 04:11:42,413 WARN [kvm.resource.KVMHAMonitor] (Thread-4:null) write heartbeat failed: /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh: line 131: echo: write error: Input/output error, retry: 4
> 2014-04-22 04:11:42,414 WARN [kvm.resource.KVMHAMonitor] (Thread-4:null) write heartbeat failed: /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh: line 131: echo: write error: Input/output error; reboot the host
> 2014-04-22 04:11:42,472 WARN [kvm.resource.KVMHAMonitor] (Thread-4:null) write heartbeat failed: /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh: line 131: echo: write error: Input/output error, retry: 0
> 2014-04-22 04:11:42,487 WARN [kvm.resource.KVMHAMonitor] (Thread-4:null) write heartbeat failed: /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh: line 131: echo: write error: Input/output error, retry: 1
> 2014-04-22 04:11:42,507 WARN [kvm.resource.KVMHAMonitor] (Thread-4:null) write heartbeat failed: /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh: line 131: echo: write error: Input/output error, retry: 2
> 2014-04-22 04:11:42,527 WARN [kvm.resource.KVMHAMonitor] (Thread-4:null) write heartbeat failed: /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh: line 131: echo: write error: Input/output error, retry: 3
> 2014-04-22 04:11:42,542 WARN [kvm.resource.KVMHAMonitor] (Thread-4:null) write heartbeat failed: /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh: line 131: echo: write error: Input/output error, retry: 4
> 2014-04-22 04:11:42,542 WARN [kvm.resource.KVMHAMonitor] (Thread-4:null) write heartbeat failed: /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh: line 131: echo: write error: Input/output error; reboot the host
> 2014-04-22 04:11:45,926 INFO [cloud.agent.Agent] (Agent-Handler-1:null) Reconnecting...
> 2014-04-22 04:11:45,927 INFO [utils.nio.NioClient] (Agent-Selector:null) Connecting to 10.223.49.195:8250
> 2014-04-22 04:11:45,929 ERROR [utils.nio.NioConnection] (Agent-Selector:null) Unable to initialize the threads.
> java.net.SocketException: Network is unreachable
> at sun.nio.ch.Net.connect0(Native Method)
> at sun.nio.ch.Net.connect(Net.java:465)
> at sun.nio.ch.Net.connect(Net.java:457)
> at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:666)
> at com.cloud.utils.nio.NioClient.init(NioClient.java:67)
> at com.cloud.utils.nio.NioConnection.run(NioConnection.java:111)
> at java.lang.Thread.run(Thread.java:744)
> 2014-04-22 04:11:46,789 INFO [cloud.agent.Agent] (AgentShutdownThread:null) Stopping the agent: Reason = sig.kill
> 2014-04-22 04:13:43,347 INFO [cloud.agent.AgentShell] (main:null) Agent started
> 2014-04-22 04:13:43,392 INFO [cloud.agent.AgentShell] (main:null) Implementation Version is 4.4.0-SNAPSHOT
> 2014-04-22 04:13:43,394 INFO [cloud.agent.AgentShell] (main:null) agent.properties found at /etc/cloudstack/agent/agent.properties
> 2014-04-22 04:13:43,442 INFO [cloud.agent.AgentShell] (main:null) Defaulting to using properties file for storage
> 2014-04-22 04:13:43,444 INFO [cloud.agent.AgentShell] (main:null) Defaulting to the constant time backoff algorithm
> 2014-04-22 04:13:43,464 INFO [cloud.utils.LogUtils] (main:null) log4j configuration found at /etc/cloudstack/agent/log4j-cloud.xml
> 2014-04-22 04:13:43,616 INFO [cloud.agent.Agent] (main:null) id is 0
> 2014-04-22 04:13:44,442 INFO [kvm.resource.LibvirtComputingResource] (main:null) No libvirt.vif.driver specified. Defaults to BridgeVifDriver.
> 2014-04-22 04:13:44,539 INFO [cloud.agent.Agent] (main:null) Agent [id = 0 : type = LibvirtComputingResource : zone = 1 : pod = 1 : workers = 5 : host = 10.223.49.195 : port = 8250
> 2014-04-22 04:13:44,606 INFO [utils.nio.NioClient] (Agent-Selector:null) Connecting to 10.223.49.195:8250
> 2014-04-22 04:13:44,869 INFO [utils.nio.NioClient] (Agent-Selector:null) SSL: Handshake done
> 2014-04-22 04:13:44,870 INFO [utils.nio.NioClient] (Agent-Selector:null) Connected to 10.223.49.195:8250
> 2014-04-22 04:13:44,892 WARN [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:null) Could not read cpuinfo_max_freq
> 2014-04-22 04:13:45,099 INFO [cloud.serializer.GsonHelper] (Agent-Handler-1:null) Default Builder inited.
> 2014-04-22 04:13:45,166 INFO [cloud.agent.Agent] (Agent-Handler-2:null) Proccess agent startup answer, agent id = 0
> 2014-04-22 04:13:45,166 INFO [cloud.agent.Agent] (Agent-Handler-2:null) Set agent id 0
> 2014-04-22 04:13:45,174 INFO [cloud.agent.Agent] (Agent-Handler-2:null) Startup Response Received: agent id = 0
> 2014-04-22 04:14:43,163 INFO [cloud.agent.Agent] (Agent-Handler-2:null) Lost connection to the server. Dealing with the remaining commands...
> 2014-04-22 04:14:48,164 INFO [cloud.agent.Agent] (Agent-Handler-2:null) Reconnecting...
> 2014-04-22 04:14:48,164 INFO [utils.nio.NioClient] (Agent-Selector:null) Connecting to 10.223.49.195:8250
> 2014-04-22 04:14:48,257 INFO [utils.nio.NioClient] (Agent-Selector:null) SSL: Handshake done
> 2014-04-22 04:14:48,257 INFO [utils.nio.NioClient] (Agent-Selector:null) Connected to 10.223.49.195:8250
>
--
This message was sent by Atlassian JIRA
(v6.2#6252)