You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@cloudstack.apache.org by "Koushik Das (JIRA)" <ji...@apache.org> on 2013/09/04 13:21:54 UTC

[jira] [Resolved] (CLOUDSTACK-4499) Xen6.1/Xen6.2 hosts initially transition to 'Alert' and then to 'Up' after addHost

     [ https://issues.apache.org/jira/browse/CLOUDSTACK-4499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Koushik Das resolved CLOUDSTACK-4499.
-------------------------------------

    Resolution: Fixed
    
> Xen6.1/Xen6.2 hosts initially transition to 'Alert' and then to 'Up' after addHost
> ----------------------------------------------------------------------------------
>
>                 Key: CLOUDSTACK-4499
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-4499
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the default.) 
>          Components: API, Management Server, XenServer
>    Affects Versions: 4.2.0
>            Reporter: Prasanna Santhanam
>            Assignee: Koushik Das
>            Priority: Critical
>             Fix For: 4.2.1
>
>         Attachments: cloud.sql.tar.bz2, mslog.tar.bz2, SMlog
>
>
> The same bug was reported in CLOUDSTACK-3839 which has been determined to be a
> UI issue but the problem is with the addHost API and Xenserver hosts 6.1/6.2. 
> When a freshly provisioned 6.1/6.2 host is added to cloudstack it initially
> goes into 'Alert' state thereby failing the storage pool creation subsequently.
> StoragePool addition requires the hosts to be in Up state. But a minute or two
> later the Xenserver host automatically moves back to Up state after which
> storage pool addition can occur
> When the host is added to cloudstack:
> 2013-08-26 06:46:22,965 DEBUG [agent.manager.DirectAgentAttache] (DirectAgent-1:null) Seq 1-1916403719: Executing request
> 2013-08-26 06:46:23,261 DEBUG [xen.resource.CitrixResourceBase] (DirectAgent-1:null) Can't find a vif on dom0 for link local, creating a new one
> 2013-08-26 06:46:23,274 DEBUG [xen.resource.CitrixResourceBase] (DirectAgent-1:null) Lowest available Vif device number: 0 for VM: Control domain on host: apache-81-3
> 2013-08-26 06:46:23,600 WARN  [xen.resource.CitrixResourceBase] (DirectAgent-1:null) Unable to create local link network
> The server failed to handle your request, due to an internal error.  The given message may give details useful for debugging the problem.
>     at com.xensource.xenapi.Types.checkResponse(Types.java:1694)
>     at com.xensource.xenapi.Connection.dispatch(Connection.java:368)
>     at com.cloud.hypervisor.xen.resource.XenServerConnectionPool$XenServerConnection.dispatch(XenServerConnectionPool.java:909)
>     at com.xensource.xenapi.VIF.plug(VIF.java:846)
> Host marked Alert:
> 2013-08-26 06:46:23,616 DEBUG [agent.manager.AgentManagerImpl] (catalina-exec-14:null) Sending Disconnect to listener: com.cloud.network.NetworkUsageManagerImpl$DirectNetworkStatsListener
> 2013-08-26 06:46:23,616 DEBUG [cloud.network.NetworkUsageManagerImpl] (catalina-exec-14:null) Disconnected called on 1 with status Alert
> 2013-08-26 06:46:23,616 DEBUG [agent.manager.AgentManagerImpl] (catalina-exec-14:null) Sending Disconnect to listener: com.cloud.consoleproxy.ConsoleProxyListener
> 2013-08-26 06:46:23,619 DEBUG [cloud.host.Status] (catalina-exec-14:null) Transition:[Resource state = Enabled, Agent event = AgentDisconnected, Host id = 1, name = apache-81-3]
> 2013-08-26 06:46:23,625 DEBUG [cloud.host.Status] (catalina-exec-14:null) Agent status update: [id = 1; name = apache-81-3; old status = Connecting; event = AgentDisconnected; new status = Alert; old update count = 1; new update count = 2]
> Automatic retry by CloudStack a few moments later:
> 2013-08-26 06:46:52,752 DEBUG [host.dao.HostDaoImpl] (ClusteredAgentManager Timer:null) Completed acquiring hosts for clusters not owned by any management server
> 2013-08-26 06:46:52,756 DEBUG [agent.manager.ClusteredAgentManagerImpl] (ClusteredAgentManager Timer:null) Found 2 unmanaged direct hosts, processing connect for them...
> 2013-08-26 06:46:52,757 DEBUG [agent.manager.ClusteredAgentManagerImpl] (ClusteredAgentManager Timer:null) Loading directly connected host 1(apache-81-3)
> Meanwhile Storage Pool addition fails:
> 2013-08-26 06:46:52,766 DEBUG [cloud.api.ApiServlet] (catalina-exec-17:null) ===END===  10.208.8.5 -- GET  username=root&apiKey=sLUjl2n1c33JJpY4ZzMtObUgp9Ah2xW5inofCclZQQ1V6xp5k3CmAsGhemLwQxVvQY67EiopM_fbWBoJRKCIww&podid=fdf30680-d5c7-425e-bd7f-fb8c52e7cb96&hypervisor=XenServer&clusterid=10ff9a88-54bc-405b-9b4b-c26acf9caca8&zoneid=0431023e-ff78-43b6-a32d-262f36a78cbb&command=addHost&url=http%3A%2F%2Fapache-81-2&signature=oJqOno9flCtPMSSHH%2FZD%2BJl5QXw%3D&response=json
> 2013-08-26 06:46:52,773 DEBUG [cloud.api.ApiServlet] (catalina-exec-16:null) ===START===  10.208.8.5 -- GET  apiKey=sLUjl2n1c33JJpY4ZzMtObUgp9Ah2xW5inofCclZQQ1V6xp5k3CmAsGhemLwQxVvQY67EiopM_fbWBoJRKCIww&name=z0p0c0ps0&url=nfs%3A%2F%2Fnfs.fmt.vmops.com%3A%2Fexport%2Fautomation%2Facs%2Fprimary&podid=fdf30680-d5c7-425e-bd7f-fb8c52e7cb96&clusterid=10ff9a88-54bc-405b-9b4b-c26acf9caca8&zoneid=0431023e-ff78-43b6-a32d-262f36a78cbb&command=createStoragePool&signature=70CuFbmf58PU4X0QbF1tKbBGT60%3D&response=json
> 2013-08-26 06:46:52,787 DEBUG [cloud.network.NetworkModelImpl] (ClusteredAgentManager Timer:null) Failed to retrive the default label for storage traffic:zone: 1 hypervisor: XenServer due to:Unable to find the default physical network with traffic=Storage in the specified zone id
> 2013-08-26 06:46:52,839 DEBUG [datastore.lifecycle.CloudStackPrimaryDataStoreLifeCycleImpl] (catalina-exec-16:null) createPool Params @ scheme - nfs storageHost - nfs.fmt.vmops.com hostPath - /export/automation/acs/primary port - -1
> 2013-08-26 06:46:52,889 DEBUG [agent.manager.ClusteredAgentManagerImpl] (ClusteredAgentManager Timer:null) Loading directly connected host 2(apache-81-2)
> 2013-08-26 06:46:52,892 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-2:null) Simulating start for resource 192267e4-d97d-4e37-b899-82b157e3166f id 1
> 2013-08-26 06:46:52,900 DEBUG [cloud.storage.StorageManagerImpl] (catalina-exec-16:null) Failed to add data store
> com.cloud.utils.exception.CloudRuntimeException: No host up to associate a storage pool with in cluster 1
>     at org.apache.cloudstack.storage.datastore.lifecycle.CloudStackPrimaryDataStoreLifeCycleImpl.attachCluster(CloudStackPrimaryDataStoreLifeCycleImpl.java:371)
>     at com.cloud.storage.StorageManagerImpl.createPool(StorageManagerImpl.java:749)
>     at com.cloud.storage.StorageManagerImpl.createPool(StorageManagerImpl.java:177)
> Host comes back to Up state:
> 2013-08-26 06:46:53,898 DEBUG [agent.manager.DirectAgentAttache] (DirectAgent-4:null) Seq 1-1519321095: Executing request
> 2013-08-26 06:46:54,279 DEBUG [xen.resource.CitrixResourceBase] (DirectAgent-4:null) already have a vif on dom0 for link local network
> 2013-08-26 06:46:54,703 DEBUG [agent.transport.Request] (AgentTaskPool-2:null) Seq -1--1: Startup request from directly connected host:  { Cmd , MgmtId: -1, via: -1, Ver: v1, Flags: 11, [{"cpus":4,"speed":2261,"memory":16190149248,"dom0MinMemory":588251136,"poolSync":false,"_clusterVMStates":{},"caps":"xen-3.0-x86_64 , xen-3.0-x86_32p , hvm-3.0-x86_32 , hvm-3.0-x86_32p , hvm-3.0-x86_64","pool":"9dfa936a-30f7-b5d1-4d4e-b5a75facb313","hypervisorType":"XenServer","hostDetails":{"product_version":"6.1.0","private.network.device":"Pool-wide network associated with eth0","com.cloud.network.Networks.RouterPrivateIpStrategy":"DcGlobal","product_brand":"XenServer","product_version_text_short":"6.1","can_bridge_firewall":"false"},"hypervisorVersion":"6.1.0","type":"Routing","dataCenter":"1","pod":"1","cluster":"1","guid":"192267e4-d97d-4e37-b899-82b157e3166f","name":"apache-81-3","version":"4.2.0","iqn":"iqn.2013-08.com.vmops.fmt:691cf4d5","publicIpAddress":"10.208.8.69","publicNetmask":"255.255.255.192","publicMacAddress":"b8:ac:6f:94:f1:67","privateIpAddress":"10.208.8.69","privateMacAddress":"b8:ac:6f:94:f1:67","privateNetmask":"255.255.255.192","storageIpAddress":"10.208.8.69","storageNetmask":"255.255.255.192","storageMacAddress":"b8:ac:6f:94:f1:67","wait":0},{"totalSize":0,"poolInfo":{"uuid":"ae248ed7-dd2a-cfc8-53f7-f43661c596c8","host":"10.208.8.69","localPath":"lvm","hostPath":"lvm","poolType":"LVM","capacityBytes":491505319936,"availableBytes":491501125632},"resourceType":"STORAGE_POOL","hostDetails":{},"type":"Storage","dataCenter":"1","guid":"192267e4-d97d-4e37-b899-82b157e3166f","wait":0}] }
> 2013-08-26 06:46:54,709 DEBUG [cloud.resource.ResourceManagerImpl] (AgentTaskPool-2:null) Dispatching resource state event CREATE_HOST_VO_FOR_DIRECT_CONNECT to BaremetalPxeManagerImpl_EnhancerByCloudStack_d79b93ae
> 2013-08-26 06:46:54,709 DEBUG [cloud.resource.ResourceManagerImpl] (AgentTaskPool-2:null) Dispatching resource state event CREATE_HOST_VO_FOR_DIRECT_CONNECT to PremiumSecondaryStorageManagerImpl_EnhancerByCloudStack_b51c5fde
> 2013-08-26 06:46:54,709 DEBUG [cloud.resource.ResourceManagerImpl] (AgentTaskPool-2:null) Dispatching resource state event CREATE_HOST_VO_FOR_DIRECT_CONNECT to LxcServerDiscoverer_EnhancerByCloudStack_449ab3e6
> 2013-08-26 06:46:54,709 DEBUG [cloud.resource.ResourceManagerImpl] (AgentTaskPool-2:null) Dispatching resource state event CREATE_HOST_VO_FOR_DIRECT_CONNECT to NiciraNvp
> 2013-08-26 06:46:54,709 DEBUG [cloud.resource.ResourceManagerImpl] (AgentTaskPool-2:null) Dispatching resource state event CREATE_HOST_VO_FOR_DIRECT_CONNECT to OvmDiscoverer_EnhancerByCloudStack_ec7ad054
> 2013-08-26 06:46:54,710 DEBUG [cloud.resource.ResourceManagerImpl] (AgentTaskPool-2:null) Dispatching resource state event CREATE_HOST_VO_FOR_DIRECT_CONNECT to XcpServerDiscoverer_EnhancerByCloudStack_831158ad
> 2013-08-26 06:46:54,710 INFO  [xen.discoverer.XcpServerDiscoverer] (AgentTaskPool-2:null) Host: apache-81-3 connected with hypervisor type: XenServer. Checking CIDR...
> 2013-08-26 06:46:54,735 DEBUG [cloud.resource.ResourceState] (AgentTaskPool-2:null) Resource state update: [id = 1; name = apache-81-3; old state = Enabled; event = InternalCreated; new state = Enabled]
> 2013-08-26 06:46:54,736 DEBUG [cloud.host.Status] (AgentTaskPool-2:null) Transition:[Resource state = Enabled, Agent event = AgentConnected, Host id = 1, name = apache-81-3]
> 2013-08-26 06:46:54,742 DEBUG [cloud.host.Status] (AgentTaskPool-2:null) Agent status update: [id = 1; name = apache-81-3; old status = Connecting; event = AgentConnected; new status = Connecting; old update count = 3; new update count = 4]
> 2013-08-26 06:46:54,946 DEBUG [cloud.host.Status] (AgentTaskPool-1:null) Agent status update: [id = 1; name = apache-81-3; old status = Connecting; event = Ready; new status = Up; old update count = 4; new update count = 5]
> There appear to be two problems:
> 1. addHost API returns successfully even if the host has transitioned to 'Alert' state
> 2. Something wrong with Xenserver 6.2/6.1 hosts that makes this happen
> This is on the latest off the 4.2 branch (8689f7f).
> Attaching management server log and Database Dump.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira