You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@cloudstack.apache.org by "exion (JIRA)" <ji...@apache.org> on 2018/04/09 08:32:00 UTC

[jira] [Updated] (CLOUDSTACK-10355) After upgrade to 4.11, Ceph RBD primary storage fails connection and renders node unusable

     [ https://issues.apache.org/jira/browse/CLOUDSTACK-10355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

exion updated CLOUDSTACK-10355:
-------------------------------
    Description: 
On a perfectly working 4.10 node with KVM hypervisor and Ceph RBD primary storage, after upgrading to 4.11, cloudstack agent is unable to connect the BRD pool in libvirt, giving just a generic "operation not supported" error in its logs:

 

2018-04-06 16:27:37,650 INFO  [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-2:null) (logid:91b4e1df) Attempting to create storage pool be80af6a-7201-3410-8da4-9b3b58c4954f (RBD) in libvirt

2018-04-06 16:27:37,652 WARN  [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-2:null) (logid:91b4e1df) Storage pool be80af6a-7201-3410-8da4-9b3b58c4954f was not found running in libvirt. Need to create it.

2018-04-06 16:27:37,653 INFO  [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-2:null) (logid:91b4e1df) Didn't find an existing storage pool be80af6a-7201-3410-8da4-9b3b58c4954f by UUID, checking for pools with duplicate paths

2018-04-06 16:27:37,664 ERROR [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-2:null) (logid:91b4e1df) Failed to create RBD storage pool: org.libvirt.LibvirtException: failed to connect to the RADOS monitor on: storagepool1:6789,: Operation not supported

2018-04-06 16:27:42,762 INFO  [cloud.agent.Agent] (Agent-Handler-4:null) (logid:) Lost connection to the server. Dealing with the remaining commands...

 

Exactly the same pool was previously working before upgrade:

 

2018-04-06 12:53:52,847 INFO  [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-3:null) (logid:14dace5e) Attempting to create storage pool be80af6a-7201-3410-8da4-9b3b58c4954f (RBD) in libvirt

2018-04-06 12:53:52,850 INFO  [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-3:null) (logid:14dace5e) Found existing defined storage pool be80af6a-7201-3410-8da4-9b3b58c4954f, using it.

2018-04-06 12:53:52,850 INFO  [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-3:null) (logid:14dace5e) Trying to fetch storage pool be80af6a-7201-3410-8da4-9b3b58c4954f from libvirt

2018-04-06 12:53:53,171 INFO  [cloud.agent.Agent] (agentRequest-Handler-2:null) (logid:14dace5e) Proccess agent ready command, agent id = 46

 

To workaround the issue I have tried to use the following XML config (dumped from another node where it is correctly running) and define the pool directly in libvirt, and it worked as expected:

 

<pool type="rbd">

  <name>be80af6a-7201-3410-8da4-9b3b58c4954f</name>

  <uuid>be80af6a-7201-3410-8da4-9b3b58c4954f</uuid>

  <source>

    <name>cephstor1</name>

    <host name='storagepool1' port='6789'/>

    <auth username='admin' type='ceph'>

      <secret uuid='be80af6a-7201-3410-8da4-9b3b58c4954f'/>

    </auth>

  </source>

</pool>

 

virsh pool-define test.xml 

Pool be80af6a-7201-3410-8da4-9b3b58c4954f defined from test.xml

 

root@compute6:~# virsh pool-start  be80af6a-7201-3410-8da4-9b3b58c4954f

Pool be80af6a-7201-3410-8da4-9b3b58c4954f started

 

root@compute6:~# virsh pool-info be80af6a-7201-3410-8da4-9b3b58c4954f

Name:           be80af6a-7201-3410-8da4-9b3b58c4954f

UUID:           be80af6a-7201-3410-8da4-9b3b58c4954f

State:          running

Persistent:     yes

Autostart:      no

Capacity:       10.05 TiB

Allocation:     2.22 TiB

Available:      2.71 TiB

 

And now the cloudstack agent correctly starts: 

 

2018-04-09 10:29:19,989 INFO  [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-2:null) (logid:f0021131) Attempting to create storage pool be80af6a-7201-3410-8da4-9b3b58c4954f (RBD) in libvirt

2018-04-09 10:29:19,990 INFO  [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-2:null) (logid:f0021131) Found existing defined storage pool be80af6a-7201-3410-8da4-9b3b58c4954f, using it.

2018-04-09 10:29:19,991 INFO  [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-2:null) (logid:f0021131) Trying to fetch storage pool be80af6a-7201-3410-8da4-9b3b58c4954f from libvirt

2018-04-09 10:29:20,372 INFO  [cloud.agent.Agent] (agentRequest-Handler-2:null) (logid:f0021131) Proccess agent ready command, agent id = 56

 

 

  was:
On a perfectly working 4.10 node with KVM hypervisor and Ceph RBD primary storage, after upgrading to 4.11, cloudstack agent is unable to connect the BRD pool in libvirt, giving just a generic "operation not supported" error in its logs:

 

2018-04-06 16:27:37,650 INFO  [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-2:null) (logid:91b4e1df) Attempting to create storage pool be80af6a-7201-3410-8da4-9b3b58c4954f (RBD) in libvirt

2018-04-06 16:27:37,652 WARN  [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-2:null) (logid:91b4e1df) Storage pool be80af6a-7201-3410-8da4-9b3b58c4954f was not found running in libvirt. Need to create it.

2018-04-06 16:27:37,653 INFO  [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-2:null) (logid:91b4e1df) Didn't find an existing storage pool be80af6a-7201-3410-8da4-9b3b58c4954f by UUID, checking for pools with duplicate paths

2018-04-06 16:27:37,664 ERROR [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-2:null) (logid:91b4e1df) Failed to create RBD storage pool: org.libvirt.LibvirtException: failed to connect to the RADOS monitor on: storagepool1:6789,: Operation not supported

2018-04-06 16:27:42,762 INFO  [cloud.agent.Agent] (Agent-Handler-4:null) (logid:) Lost connection to the server. Dealing with the remaining commands...

 

Exactly the same pool was previously working before upgrade:

 

2018-04-06 12:53:52,847 INFO  [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-3:null) (logid:14dace5e) Attempting to create storage pool be80af6a-7201-3410-8da4-9b3b58c4954f (RBD) in libvirt

2018-04-06 12:53:52,850 INFO  [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-3:null) (logid:14dace5e) Found existing defined storage pool be80af6a-7201-3410-8da4-9b3b58c4954f, using it.

2018-04-06 12:53:52,850 INFO  [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-3:null) (logid:14dace5e) Trying to fetch storage pool be80af6a-7201-3410-8da4-9b3b58c4954f from libvirt

2018-04-06 12:53:53,171 INFO  [cloud.agent.Agent] (agentRequest-Handler-2:null) (logid:14dace5e) Proccess agent ready command, agent id = 46

 

To nail out the issue I have tried to use the following XML config and attach the pool directly to libvirt in order to nail out system related issues, and it worked as expected:

 

<pool type="rbd">

  <name>be80af6a-7201-3410-8da4-9b3b58c4954f</name>

  <source>

    <name>cephstor1</name>

    <host name='storagepool1' port='6789'/>

    <auth username='admin' type='ceph'>

      <secret uuid='XXXXX'/>

    </auth>

  </source>

</pool>

 

virsh pool-create test.xml 

Pool be80af6a-7201-3410-8da4-9b3b58c4954f created from test.xml

 

root@compute6:~# virsh pool-info be80af6a-7201-3410-8da4-9b3b58c4954f

Name:           be80af6a-7201-3410-8da4-9b3b58c4954f

UUID:           47afe7d4-61cb-46c5-a642-93712c758b5c

State:          running

Persistent:     no

Autostart:      no

Capacity:       10.05 TiB

Allocation:     2.22 TiB

Available:      2.71 TiB

 

That being said the issue looks related to the way cloudstack scripts interface with libvirt's daemon.

 

 

 

 


> After upgrade to 4.11, Ceph RBD primary storage fails connection and renders node unusable
> ------------------------------------------------------------------------------------------
>
>                 Key: CLOUDSTACK-10355
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-10355
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the default.) 
>          Components: cloudstack-agent
>    Affects Versions: 4.11.0.0
>            Reporter: exion
>            Priority: Blocker
>
> On a perfectly working 4.10 node with KVM hypervisor and Ceph RBD primary storage, after upgrading to 4.11, cloudstack agent is unable to connect the BRD pool in libvirt, giving just a generic "operation not supported" error in its logs:
>  
> 2018-04-06 16:27:37,650 INFO  [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-2:null) (logid:91b4e1df) Attempting to create storage pool be80af6a-7201-3410-8da4-9b3b58c4954f (RBD) in libvirt
> 2018-04-06 16:27:37,652 WARN  [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-2:null) (logid:91b4e1df) Storage pool be80af6a-7201-3410-8da4-9b3b58c4954f was not found running in libvirt. Need to create it.
> 2018-04-06 16:27:37,653 INFO  [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-2:null) (logid:91b4e1df) Didn't find an existing storage pool be80af6a-7201-3410-8da4-9b3b58c4954f by UUID, checking for pools with duplicate paths
> 2018-04-06 16:27:37,664 ERROR [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-2:null) (logid:91b4e1df) Failed to create RBD storage pool: org.libvirt.LibvirtException: failed to connect to the RADOS monitor on: storagepool1:6789,: Operation not supported
> 2018-04-06 16:27:42,762 INFO  [cloud.agent.Agent] (Agent-Handler-4:null) (logid:) Lost connection to the server. Dealing with the remaining commands...
>  
> Exactly the same pool was previously working before upgrade:
>  
> 2018-04-06 12:53:52,847 INFO  [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-3:null) (logid:14dace5e) Attempting to create storage pool be80af6a-7201-3410-8da4-9b3b58c4954f (RBD) in libvirt
> 2018-04-06 12:53:52,850 INFO  [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-3:null) (logid:14dace5e) Found existing defined storage pool be80af6a-7201-3410-8da4-9b3b58c4954f, using it.
> 2018-04-06 12:53:52,850 INFO  [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-3:null) (logid:14dace5e) Trying to fetch storage pool be80af6a-7201-3410-8da4-9b3b58c4954f from libvirt
> 2018-04-06 12:53:53,171 INFO  [cloud.agent.Agent] (agentRequest-Handler-2:null) (logid:14dace5e) Proccess agent ready command, agent id = 46
>  
> To workaround the issue I have tried to use the following XML config (dumped from another node where it is correctly running) and define the pool directly in libvirt, and it worked as expected:
>  
> <pool type="rbd">
>   <name>be80af6a-7201-3410-8da4-9b3b58c4954f</name>
>   <uuid>be80af6a-7201-3410-8da4-9b3b58c4954f</uuid>
>   <source>
>     <name>cephstor1</name>
>     <host name='storagepool1' port='6789'/>
>     <auth username='admin' type='ceph'>
>       <secret uuid='be80af6a-7201-3410-8da4-9b3b58c4954f'/>
>     </auth>
>   </source>
> </pool>
>  
> virsh pool-define test.xml 
> Pool be80af6a-7201-3410-8da4-9b3b58c4954f defined from test.xml
>  
> root@compute6:~# virsh pool-start  be80af6a-7201-3410-8da4-9b3b58c4954f
> Pool be80af6a-7201-3410-8da4-9b3b58c4954f started
>  
> root@compute6:~# virsh pool-info be80af6a-7201-3410-8da4-9b3b58c4954f
> Name:           be80af6a-7201-3410-8da4-9b3b58c4954f
> UUID:           be80af6a-7201-3410-8da4-9b3b58c4954f
> State:          running
> Persistent:     yes
> Autostart:      no
> Capacity:       10.05 TiB
> Allocation:     2.22 TiB
> Available:      2.71 TiB
>  
> And now the cloudstack agent correctly starts: 
>  
> 2018-04-09 10:29:19,989 INFO  [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-2:null) (logid:f0021131) Attempting to create storage pool be80af6a-7201-3410-8da4-9b3b58c4954f (RBD) in libvirt
> 2018-04-09 10:29:19,990 INFO  [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-2:null) (logid:f0021131) Found existing defined storage pool be80af6a-7201-3410-8da4-9b3b58c4954f, using it.
> 2018-04-09 10:29:19,991 INFO  [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-2:null) (logid:f0021131) Trying to fetch storage pool be80af6a-7201-3410-8da4-9b3b58c4954f from libvirt
> 2018-04-09 10:29:20,372 INFO  [cloud.agent.Agent] (agentRequest-Handler-2:null) (logid:f0021131) Proccess agent ready command, agent id = 56
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)