You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cloudstack.apache.org by 明 达 <gu...@hotmail.com> on 2016/09/30 01:14:10 UTC

Hosts can not connect to secondary storage(NFS)

Hi, Community

We have a small production environment consists of 5 hosts (KVM, Ubuntu 14.04) and the secondary storage is NFS running on an separated management host.

Days ago,  we wrongly put one host in 'maintenance'  which caused all the VMs running on that host to migrate to other available hosts.  but these hosts turned into 'alert' or 'disconnected' state on ACS UI, and meanwhile from the kernel log, we can see the repeated message ' kernel: [3270144.284365] nfs: server 10.226.32.4 not responding, timed out' .

It seems all the hosts can not mount or unmount the NFS storage.  We have to use 'unmount -lf' to forcely unmount the NFS and get the host state back to normal by restarting the libivrt and cloudstack agent.  But the issue still sits there, all the hosts can not mount NFS with the solid error 'nfs: server 10.226.32.4 not responding, timed out'.

To isolate this issue,  we added a fresh new host into the environment,  it can communicate with NFS with no problem. So the issue seems only happens with the existing 5 hosts.   We guess it could be fixed by restarting the hosts but we can not afford that as of now since they are all running production apps now.

Can anyone share some advice or hints to get the secondary storage back?   Thanks a lot !

________________________________
gumingda@hotmail.com

Re: Hosts can not connect to secondary storage(NFS)

Posted by Dag Sonstebo <Da...@shapeblue.com>.
Hi Gumingda,

This is most likely to be caused by either lack of network connectivity between the hosts and your NFS head (which I guess you will have checked) or permissions set in your /etc/exports file on your NFS server.

If you have checked and ruled out both of these also check the UIDs of the users on your working and non working hosts – if these are different you may find that the permissions on your NFS server file system need to be relaxed – i.e. you may need to chmod to 0770 rather than 0700.

Regards,
Dag Sonstebo
Cloud Architect
ShapeBlue

---

On 30/09/2016, 02:14, "明 达" <gu...@hotmail.com> wrote:

    Hi, Community
    
    We have a small production environment consists of 5 hosts (KVM, Ubuntu 14.04) and the secondary storage is NFS running on an separated management host.
    
    Days ago,  we wrongly put one host in 'maintenance'  which caused all the VMs running on that host to migrate to other available hosts.  but these hosts turned into 'alert' or 'disconnected' state on ACS UI, and meanwhile from the kernel log, we can see the repeated message ' kernel: [3270144.284365] nfs: server 10.226.32.4 not responding, timed out' .
    
    It seems all the hosts can not mount or unmount the NFS storage.  We have to use 'unmount -lf' to forcely unmount the NFS and get the host state back to normal by restarting the libivrt and cloudstack agent.  But the issue still sits there, all the hosts can not mount NFS with the solid error 'nfs: server 10.226.32.4 not responding, timed out'.
    
    To isolate this issue,  we added a fresh new host into the environment,  it can communicate with NFS with no problem. So the issue seems only happens with the existing 5 hosts.   We guess it could be fixed by restarting the hosts but we can not afford that as of now since they are all running production apps now.
    
    Can anyone share some advice or hints to get the secondary storage back?   Thanks a lot !
    
    ________________________________
    gumingda@hotmail.com
    


Dag.Sonstebo@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue