You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@cloudstack.apache.org by "Alena Prokharchyk (JIRA)" <ji...@apache.org> on 2014/05/02 22:48:18 UTC

[jira] [Commented] (CLOUDSTACK-6475) [Automation] communication between cloudstack agent and MS disconnecting continuously

    [ https://issues.apache.org/jira/browse/CLOUDSTACK-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988216#comment-13988216 ] 

Alena Prokharchyk commented on CLOUDSTACK-6475:
-----------------------------------------------

Update from Kelven:

I dug into the problem further, it is caused by a long-locked transaction in VpcManagerImpl.java, destroyVpc() has external calls to agent and it has kept a transaction to open. When agent reconnect back to management server, it will get TIME-WAIT exception there
 
Kelven
 
 
    protected class VpcCleanupTask extends ManagedContextRunnable {
        @Override
        protected void runInContext() {
            try {
                GlobalLock lock = GlobalLock.getInternLock("VpcCleanup");
                if (lock == null) {
                    s_logger.debug("Couldn't get the global lock");
                    return;
                }
 
                if (!lock.lock(30)) {
                    s_logger.debug("Couldn't lock the db");
                    return;
                }
 
                try {
                    Transaction.execute(new TransactionCallbackWithExceptionNoReturn<Exception>() {
                        @Override
                        public void doInTransactionWithoutResult(TransactionStatus status) throws Exception {
                    // Cleanup inactive VPCs
                    List<VpcVO> inactiveVpcs = _vpcDao.listInactiveVpcs();
                    s_logger.info("Found " + inactiveVpcs.size() + " removed VPCs to cleanup");
                    for (VpcVO vpc : inactiveVpcs) {
                        s_logger.debug("Cleaning up " + vpc);
                        destroyVpc(vpc, _accountMgr.getAccount(Account.ACCOUNT_ID_SYSTEM), User.UID_SYSTEM);
                    }
                        }
                    });
                } catch (Exception e) {
                    s_logger.error("Exception ", e);
                } finally {
                    lock.unlock();
                }
            } catch (Exception e) {
                s_logger.error("Exception ", e);
            }
        }
    }
 

> [Automation] communication between cloudstack agent and MS disconnecting continuously  
> ---------------------------------------------------------------------------------------
>
>                 Key: CLOUDSTACK-6475
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-6475
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the default.) 
>          Components: KVM
>    Affects Versions: 4.4.0
>         Environment: RHEL 6.3
>            Reporter: Rayees Namathponnan
>            Assignee: Alena Prokharchyk
>             Fix For: 4.4.0
>
>         Attachments: Agent_log.rar, management-server.rar
>
>
> This issue is observed during automation, run.
> communication between cloudstack agent and ms getting disconnected continuously;  observed below error in agent log
>  2014-04-22 04:08:47,867 INFO  [cloud.agent.Agent] (AgentShutdownThread:null) Stopping the agent: Reason = sig.kill
> 2014-04-22 04:10:41,456 INFO  [cloud.agent.AgentShell] (main:null) Agent started
> 2014-04-22 04:10:41,500 INFO  [cloud.agent.AgentShell] (main:null) Implementation Version is 4.4.0-SNAPSHOT
> 2014-04-22 04:10:41,502 INFO  [cloud.agent.AgentShell] (main:null) agent.properties found at /etc/cloudstack/agent/agent.properties
> 2014-04-22 04:10:41,551 INFO  [cloud.agent.AgentShell] (main:null) Defaulting to using properties file for storage
> 2014-04-22 04:10:41,552 INFO  [cloud.agent.AgentShell] (main:null) Defaulting to the constant time backoff algorithm
> 2014-04-22 04:10:41,572 INFO  [cloud.utils.LogUtils] (main:null) log4j configuration found at /etc/cloudstack/agent/log4j-cloud.xml
> 2014-04-22 04:10:41,722 INFO  [cloud.agent.Agent] (main:null) id is 0
> 2014-04-22 04:10:42,501 INFO  [kvm.resource.LibvirtComputingResource] (main:null) No libvirt.vif.driver specified. Defaults to BridgeVifDriver.
> 2014-04-22 04:10:42,590 INFO  [cloud.agent.Agent] (main:null) Agent [id = 0 : type = LibvirtComputingResource : zone = 1 : pod = 1 : workers = 5 : host = 10.223.49.195 : port = 8250
> 2014-04-22 04:10:42,664 INFO  [utils.nio.NioClient] (Agent-Selector:null) Connecting to 10.223.49.195:8250
> 2014-04-22 04:10:42,920 INFO  [utils.nio.NioClient] (Agent-Selector:null) SSL: Handshake done
> 2014-04-22 04:10:42,920 INFO  [utils.nio.NioClient] (Agent-Selector:null) Connected to 10.223.49.195:8250
> 2014-04-22 04:10:42,941 WARN  [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:null) Could not read cpuinfo_max_freq
> 2014-04-22 04:10:43,158 INFO  [cloud.serializer.GsonHelper] (Agent-Handler-1:null) Default Builder inited.
> 2014-04-22 04:10:43,227 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Proccess agent startup answer, agent id = 0
> 2014-04-22 04:10:43,227 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Set agent id 0
> 2014-04-22 04:10:43,233 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Startup Response Received: agent id = 0
> 2014-04-22 04:11:40,925 INFO  [cloud.agent.Agent] (Agent-Handler-1:null) Lost connection to the server. Dealing with the remaining commands...
> 2014-04-22 04:11:42,352 WARN  [kvm.resource.KVMHAMonitor] (Thread-4:null) write heartbeat failed: /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh: line 131: echo: write error: Input/output error, retry: 0
> 2014-04-22 04:11:42,368 WARN  [kvm.resource.KVMHAMonitor] (Thread-4:null) write heartbeat failed: /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh: line 131: echo: write error: Input/output error, retry: 1
> 2014-04-22 04:11:42,383 WARN  [kvm.resource.KVMHAMonitor] (Thread-4:null) write heartbeat failed: /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh: line 131: echo: write error: Input/output error, retry: 2
> 2014-04-22 04:11:42,398 WARN  [kvm.resource.KVMHAMonitor] (Thread-4:null) write heartbeat failed: /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh: line 131: echo: write error: Input/output error, retry: 3
> 2014-04-22 04:11:42,413 WARN  [kvm.resource.KVMHAMonitor] (Thread-4:null) write heartbeat failed: /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh: line 131: echo: write error: Input/output error, retry: 4
> 2014-04-22 04:11:42,414 WARN  [kvm.resource.KVMHAMonitor] (Thread-4:null) write heartbeat failed: /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh: line 131: echo: write error: Input/output error; reboot the host
> 2014-04-22 04:11:42,472 WARN  [kvm.resource.KVMHAMonitor] (Thread-4:null) write heartbeat failed: /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh: line 131: echo: write error: Input/output error, retry: 0
> 2014-04-22 04:11:42,487 WARN  [kvm.resource.KVMHAMonitor] (Thread-4:null) write heartbeat failed: /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh: line 131: echo: write error: Input/output error, retry: 1
> 2014-04-22 04:11:42,507 WARN  [kvm.resource.KVMHAMonitor] (Thread-4:null) write heartbeat failed: /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh: line 131: echo: write error: Input/output error, retry: 2
> 2014-04-22 04:11:42,527 WARN  [kvm.resource.KVMHAMonitor] (Thread-4:null) write heartbeat failed: /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh: line 131: echo: write error: Input/output error, retry: 3
> 2014-04-22 04:11:42,542 WARN  [kvm.resource.KVMHAMonitor] (Thread-4:null) write heartbeat failed: /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh: line 131: echo: write error: Input/output error, retry: 4
> 2014-04-22 04:11:42,542 WARN  [kvm.resource.KVMHAMonitor] (Thread-4:null) write heartbeat failed: /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh: line 131: echo: write error: Input/output error; reboot the host
> 2014-04-22 04:11:45,926 INFO  [cloud.agent.Agent] (Agent-Handler-1:null) Reconnecting...
> 2014-04-22 04:11:45,927 INFO  [utils.nio.NioClient] (Agent-Selector:null) Connecting to 10.223.49.195:8250
> 2014-04-22 04:11:45,929 ERROR [utils.nio.NioConnection] (Agent-Selector:null) Unable to initialize the threads.
> java.net.SocketException: Network is unreachable
>         at sun.nio.ch.Net.connect0(Native Method)
>         at sun.nio.ch.Net.connect(Net.java:465)
>         at sun.nio.ch.Net.connect(Net.java:457)
>         at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:666)
>         at com.cloud.utils.nio.NioClient.init(NioClient.java:67)
>         at com.cloud.utils.nio.NioConnection.run(NioConnection.java:111)
>         at java.lang.Thread.run(Thread.java:744)
> 2014-04-22 04:11:46,789 INFO  [cloud.agent.Agent] (AgentShutdownThread:null) Stopping the agent: Reason = sig.kill
> 2014-04-22 04:13:43,347 INFO  [cloud.agent.AgentShell] (main:null) Agent started
> 2014-04-22 04:13:43,392 INFO  [cloud.agent.AgentShell] (main:null) Implementation Version is 4.4.0-SNAPSHOT
> 2014-04-22 04:13:43,394 INFO  [cloud.agent.AgentShell] (main:null) agent.properties found at /etc/cloudstack/agent/agent.properties
> 2014-04-22 04:13:43,442 INFO  [cloud.agent.AgentShell] (main:null) Defaulting to using properties file for storage
> 2014-04-22 04:13:43,444 INFO  [cloud.agent.AgentShell] (main:null) Defaulting to the constant time backoff algorithm
> 2014-04-22 04:13:43,464 INFO  [cloud.utils.LogUtils] (main:null) log4j configuration found at /etc/cloudstack/agent/log4j-cloud.xml
> 2014-04-22 04:13:43,616 INFO  [cloud.agent.Agent] (main:null) id is 0
> 2014-04-22 04:13:44,442 INFO  [kvm.resource.LibvirtComputingResource] (main:null) No libvirt.vif.driver specified. Defaults to BridgeVifDriver.
> 2014-04-22 04:13:44,539 INFO  [cloud.agent.Agent] (main:null) Agent [id = 0 : type = LibvirtComputingResource : zone = 1 : pod = 1 : workers = 5 : host = 10.223.49.195 : port = 8250
> 2014-04-22 04:13:44,606 INFO  [utils.nio.NioClient] (Agent-Selector:null) Connecting to 10.223.49.195:8250
> 2014-04-22 04:13:44,869 INFO  [utils.nio.NioClient] (Agent-Selector:null) SSL: Handshake done
> 2014-04-22 04:13:44,870 INFO  [utils.nio.NioClient] (Agent-Selector:null) Connected to 10.223.49.195:8250
> 2014-04-22 04:13:44,892 WARN  [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:null) Could not read cpuinfo_max_freq
> 2014-04-22 04:13:45,099 INFO  [cloud.serializer.GsonHelper] (Agent-Handler-1:null) Default Builder inited.
> 2014-04-22 04:13:45,166 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Proccess agent startup answer, agent id = 0
> 2014-04-22 04:13:45,166 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Set agent id 0
> 2014-04-22 04:13:45,174 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Startup Response Received: agent id = 0
> 2014-04-22 04:14:43,163 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Lost connection to the server. Dealing with the remaining commands...
> 2014-04-22 04:14:48,164 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Reconnecting...
> 2014-04-22 04:14:48,164 INFO  [utils.nio.NioClient] (Agent-Selector:null) Connecting to 10.223.49.195:8250
> 2014-04-22 04:14:48,257 INFO  [utils.nio.NioClient] (Agent-Selector:null) SSL: Handshake done
> 2014-04-22 04:14:48,257 INFO  [utils.nio.NioClient] (Agent-Selector:null) Connected to 10.223.49.195:8250
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)