You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2012/11/23 12:08:58 UTC

[jira] [Created] (HADOOP-9086) Enforce process singleton rules through an exclusive write lock on a file, not a pid file +kill -0,

Steve Loughran created HADOOP-9086:
--------------------------------------

             Summary: Enforce process singleton rules through an exclusive write lock on a file, not a pid file +kill -0,
                 Key: HADOOP-9086
                 URL: https://issues.apache.org/jira/browse/HADOOP-9086
             Project: Hadoop Common
          Issue Type: Improvement
          Components: util
    Affects Versions: 1.1.1, 2.0.3-alpha
         Environment: Unix/Linux. 
            Reporter: Steve Loughran


the {{hadoop-daemon.sh}} script (and other liveness monitors) probe the existence of a daemon service by a {{kill -0}} of a process id picked up from a pid file. 
This is flawed
# pid file locations may change with installations.
# Linux and Unix recycle pids, leading to false positives -the scripts think the process is running, when another process is.
# doesn't work on windows.

Having the processes acquire an exclusive write-lock on a known file would delegate lock management and implicitly liveness to the OS itself. when the process dies, the lock is relased (on Unixes)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-9086) Enforce process singleton rules through an exclusive write lock on a file, not a pid file +kill -0,

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-9086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503163#comment-13503163 ] 

Steve Loughran commented on HADOOP-9086:
----------------------------------------

This is the strategy adopted by {{daemontools}}: [http://cr.yp.to/daemontools/setlock.html]

that {{setlock}} command does not modify the invoked code, but it does require that the only way a service can be deployed is via the setlock process. 

For Hadoop, some options are
# have the hadoop-service/init.d scripts use something similar to setlock.
# move the lock creation logic into the Singleton services themselves -they'd take an option listing the file to create, attempt to create/open that file with exclusive write on startup and exit immediately if that could not be done.
# the service scripts could then omit the liveness checks themselves, because the daemon would do it for them. however, pid files have other uses (e.g {{sudo kill `cat /var/log/hadoop/namenode.pid`}}). They should still be created -just not used for enforcing singleton logic.

This *should* also work on Windows, with the caveat that older non-Server editions of Windows didn't always release file locks on process termination. Testing would be required there.

                
> Enforce process singleton rules through an exclusive write lock on a file, not a pid file +kill -0,
> ---------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-9086
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9086
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 1.1.1, 2.0.3-alpha
>         Environment: Unix/Linux. 
>            Reporter: Steve Loughran
>
> the {{hadoop-daemon.sh}} script (and other liveness monitors) probe the existence of a daemon service by a {{kill -0}} of a process id picked up from a pid file. 
> This is flawed
> # pid file locations may change with installations.
> # Linux and Unix recycle pids, leading to false positives -the scripts think the process is running, when another process is.
> # doesn't work on windows.
> Having the processes acquire an exclusive write-lock on a known file would delegate lock management and implicitly liveness to the OS itself. when the process dies, the lock is relased (on Unixes)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira