You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@vcl.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2017/02/01 18:28:51 UTC

[jira] [Commented] (VCL-924) Commands may hang on management node if it has an unavailable NFS share

    [ https://issues.apache.org/jira/browse/VCL-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15848748#comment-15848748 ] 

ASF subversion and git services commented on VCL-924:
-----------------------------------------------------

Commit 1781295 from arkurth@apache.org in branch 'vcl/trunk'
[ https://svn.apache.org/r1781295 ]

VCL-924
Determined running lsof command on management node is unsafe and commented out code. It was only used for vcld.log informational purposes do determine which process owned a lock on a file. The lsof command occasionally sends ALRM signals which may cause vcld processes to behave badly.

> Commands may hang on management node if it has an unavailable NFS share
> -----------------------------------------------------------------------
>
>                 Key: VCL-924
>                 URL: https://issues.apache.org/jira/browse/VCL-924
>             Project: VCL
>          Issue Type: Bug
>          Components: vcld (backend)
>    Affects Versions: 2.4.2
>            Reporter: Andy Kurth
>            Assignee: Andy Kurth
>             Fix For: 2.5
>
>
> We came across a situation on one of our management nodes related to this:
> https://bugzilla.redhat.com/show_bug.cgi?id=962755
> The management node had an old NFS share mounted from a storage unit which was removed from service.  Attempts to unmount the share were not successful.
> Under fairly rare circumstances, a vcld process will call lsof on the management node in order to determine which other vcld process is preventing it from obtaining a semaphore.  This vcld process hung indefinitely due to the unavailable NFS share and the issue described in the link above.
> There is currently no timeout mechanism built into the code which executes commands locally on the management node.  It would be beneficial to add one and specify a timeout on commands which may hang such as lsof.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)