You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Mladen Turk (JIRA)" <ji...@apache.org> on 2011/01/18 10:56:43 UTC

[jira] Closed: (DAEMON-183) Abnormal shutdown leaves the pidfile, which prevents subsequent startup

     [ https://issues.apache.org/jira/browse/DAEMON-183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mladen Turk closed DAEMON-183.
------------------------------

    Resolution: Duplicate

Some of the issues were resolved with DAEMON-188, others like querying running process would be just too complex.
We rely on DeleteFile which will fail if the file is already opened (in this case by another running instance)
Also we now create the pidfile with FILE_FLAG_DELETE_ON_CLOSE, which should ensure the file is
removed from the system in most cases. However we still call DeleteFile if it exists before trying to open it.

> Abnormal shutdown leaves the pidfile, which prevents subsequent startup
> -----------------------------------------------------------------------
>
>                 Key: DAEMON-183
>                 URL: https://issues.apache.org/jira/browse/DAEMON-183
>             Project: Commons Daemon
>          Issue Type: Bug
>          Components: Procrun
>    Affects Versions: 1.0.3
>            Reporter: Steve Ash
>            Priority: Trivial
>
> This is really a trivial issue, so you may want to just close as a WONTFIX but it does represent an inconsistency that I don't feel I can release into production so I'm documenting it here.
> When using the pidfile with procrun, if the pidfile isn't deleted then the next startup fails indicating that a Pid file exists.  Due to incorrectly configuring the service (my stopmode was not set, so my main thread never returned, causing it to timeout), I often always had the pidfile existing after the service came down.  This in and of itself seems like it may be an issue.
> None the less on a subsequent startup, it failed indicating that a pidfile existed-- but then deleted the existing pidfile.  So a second attempt to start would successfully work.  It just felt a little strange that it would fail the first time, and then work the second time.  I don't really know if its wrong, but I know that my customers would feel this is fragile/weird.  Thus, I am just not using the pidfile.
> So a few thoughts:
> 1) should the pidfile check go further and query for a running process with the expected image (servicename.exe) and process id?  and if it doesn't exist, assume this is an orphaned pidfile and delete it then continue startup
> 2) obviously if scm or an external user kills the process then you can't delete the file-- but the timeout that I experienced I think came from SCM not from the timeout in serviceStop (e.g. I don't think I had a "Worker was killed" message).  So are you aware of a problem with the timeout logic where the SCM will force the process down instead of waiting for procrun to timeout? 
> 3) today if the process aborts startup because the pidfile already exists, the gPidfileName global has already been set, and thus it deletes the pidfile (i.e. why the second attempt to start succeeds).  What happens if this pid file represents a real already running process?  Is the other process locking it-- and the delete would fail?  Or would it successfully delete the pidfile now allowing multiple concurrent instances to run?
> Just a few minor things.  If you feel any of these things are worth implementing/changing, I would be happy to work on it and submit a patch.  If not, no worries.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.