You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Jim Blye (Jira)" <ji...@apache.org> on 2022/11/15 14:44:00 UTC

[jira] [Comment Edited] (DAEMON-449) Prunsrv fails to stop Open Liberty Windows service

    [ https://issues.apache.org/jira/browse/DAEMON-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17634389#comment-17634389 ] 

Jim Blye edited comment on DAEMON-449 at 11/15/22 2:43 PM:
-----------------------------------------------------------

[~markt] , the root cause actually seems to be happening on *start* rather than {*}stop{*}.  When the service is started, it invokes server.bat.  The server.bat script starts a jvm (the liberty server) and then starts another jvm which waits for the status of the server to indicate that the server is all the way up.   The problem seems to be that prunsrv does not wait for the server.bat script to finish.  In the Windows "services" panel, the service start seems to be completed, though the server.bat file is still running waiting for the liberty server to come up.  If I try to stop the service too early, the stop fails as described above.

I discovered this by adding some logging to server.bat.  Before and after the starting of the jvms, I added logging statements.   Then I recreate the problem by starting the server and then immediately stopping the server as soon as the Windows "services" panel will let me.   What I find in the log file is that the server is starting and then I see that is stopping.  I don't see the message that it finished starting.

Is there a way to make prunsrv wait for server.bat to complete?
{code:java}
[2022-11-14 15:04:18] [debug] ( prunsrv.c:1763) [11932] Apache Commons Daemon procrun log initialized.
[2022-11-14 15:04:18] [info]  ( prunsrv.c:1767) [11932] Apache Commons Daemon procrun (1.2.2.0 32-bit) started.
[2022-11-14 15:04:18] [info]  ( prunsrv.c:1677) [11932] Running Service 'defaultServer'...
[2022-11-14 15:04:18] [debug] ( prunsrv.c:1448) [ 6136] Inside ServiceMain...
[2022-11-14 15:04:18] [debug] ( prunsrv.c:910 ) [ 6136] reportServiceStatusE: dwCurrentState = 2, dwWin32ExitCode = 0, dwWaitHint = 3000, dwServiceSpecificExitCode = 0.
[2022-11-14 15:04:18] [info]  ( prunsrv.c:1203) [ 6136] Starting service...
[2022-11-14 15:04:18] [debug] (rprocess.c:519 ) [ 6136] Apache Commons Daemon apxProcessExecute()
[2022-11-14 15:04:18] [debug] (rprocess.c:258 ) [ 6136] Apache Commons Daemon procrun __apxProcCreateChildPipes()
[2022-11-14 15:04:18] [debug] (rprocess.c:535 ) [ 6136] Apache Commons Daemon AplZeroMemory()
[2022-11-14 15:04:18] [debug] (rprocess.c:550 ) [ 6136] Apache Commons Daemon GetEnvironmentStringsW()
[2022-11-14 15:04:18] [debug] (rprocess.c:572 ) [ 6136] Apache Commons Daemon CreateProcessW()
[2022-11-14 15:04:18] [debug] (rprocess.c:594 ) [ 6136] Apache Commons Daemon CreateThread()
[2022-11-14 15:04:18] [debug] (rprocess.c:597 ) [ 6136] Apache Commons Daemon CreateThread()
[2022-11-14 15:04:18] [debug] (rprocess.c:600 ) [ 6136] Apache Commons Daemon ResumeThread()
[2022-11-14 15:04:19] [debug] (rprocess.c:602 ) [ 6136] Apache Commons Daemon CreateThread()
[2022-11-14 15:04:19] [debug] (rprocess.c:608 ) [ 6136] Apache Commons Daemon apxProcessExecute() returning TRUE
[2022-11-14 15:04:19] [info]  ( prunsrv.c:1359) [ 6136] Service started in 249 milliseconds.
[2022-11-14 15:04:19] [debug] ( prunsrv.c:910 ) [ 6136] reportServiceStatusE: dwCurrentState = 4, dwWin32ExitCode = 0, dwWaitHint = 0, dwServiceSpecificExitCode = 0.
[2022-11-14 15:04:19] [debug] ( prunsrv.c:1603) [ 6136] Waiting for worker to finish...
[2022-11-14 15:04:22] [debug] ( prunsrv.c:910 ) [11932] reportServiceStatusE: dwCurrentState = 3, dwWin32ExitCode = 0, dwWaitHint = 3000, dwServiceSpecificExitCode = 0.
[2022-11-14 15:04:22] [info]  ( prunsrv.c:1010) [ 7488] Stopping service...
[2022-11-14 15:04:22] [debug] (rprocess.c:519 ) [ 7488] Apache Commons Daemon apxProcessExecute()
[2022-11-14 15:04:22] [debug] (rprocess.c:258 ) [ 7488] Apache Commons Daemon procrun __apxProcCreateChildPipes()
[2022-11-14 15:04:22] [debug] (rprocess.c:535 ) [ 7488] Apache Commons Daemon AplZeroMemory()
[2022-11-14 15:04:22] [debug] (rprocess.c:550 ) [ 7488] Apache Commons Daemon GetEnvironmentStringsW()
[2022-11-14 15:04:22] [debug] (rprocess.c:572 ) [ 7488] Apache Commons Daemon CreateProcessW()
[2022-11-14 15:04:22] [debug] (rprocess.c:594 ) [ 7488] Apache Commons Daemon CreateThread()
[2022-11-14 15:04:22] [debug] (rprocess.c:597 ) [ 7488] Apache Commons Daemon CreateThread()
[2022-11-14 15:04:22] [debug] (rprocess.c:600 ) [ 7488] Apache Commons Daemon ResumeThread()
[2022-11-14 15:04:22] [debug] (rprocess.c:602 ) [ 7488] Apache Commons Daemon CreateThread()
[2022-11-14 15:04:22] [debug] (rprocess.c:608 ) [ 7488] Apache Commons Daemon apxProcessExecute() returning TRUE
[2022-11-14 15:04:22] [debug] ( prunsrv.c:1131) [ 7488] Waiting for stop worker to finish...
[2022-11-14 15:04:24] [debug] (rprocess.c:122 ) [ 4872] Child process exit code 21
[2022-11-14 15:04:24] [debug] ( prunsrv.c:1133) [ 7488] Stop worker finished.
[2022-11-14 15:04:24] [debug] ( prunsrv.c:910 ) [ 7488] reportServiceStatusE: dwCurrentState = 3, dwWin32ExitCode = 0, dwWaitHint = 300000, dwServiceSpecificExitCode = 0.
[2022-11-14 15:04:24] [debug] ( prunsrv.c:1167) [ 7488] Waiting for worker to die naturally...
[2022-11-14 15:04:26] [debug] (rprocess.c:122 ) [ 4136] Child process exit code 0 {code}
 

I think where it says *Service started in 249 milliseconds,* the service appears to be started in the Windows Services panel (guessing).  But then I see *Waiting for worker to finish...*   I assume that means it is still waiting for server.bat to finish, but it has already reported that the service is running?

Ultimately the service stop fails because the server.bat cannot communicate with the liberty server, because the liberty server isn't up yet.  


was (Author: jimblye):
[~markt] , the root cause actually seems to be happening on *start* rather than {*}stop{*}.  When the service is started, it invokes server.bat.  The server.bat script starts a jvm (the liberty server) and then starts another jvm which waits for the status of the server to indicate that the server is all the way up.   The problem seems to be that prunsrv does not wait for the server.bat script to finish.  In the Windows "services" panel, the service start seems to be completed, though the server.bat file is still running waiting for the liberty server to come up.  If I try to stop the service too early, the stop fails as described above.

I discovered this by adding some logging to server.bat.  Before and after the starting of the jvms, I added logging statements.   Then I recreate the problem by starting the server and then immediately stopping the server as soon as the Windows "services" panel will let me.   What I find in the log file is that the server is starting and then I see that is stopping.  I don't see the message that it finished starting.

Is there a way to make prunsrv wait for server.bat to complete?
{code:java}
[2022-11-14 15:04:18] [debug] ( prunsrv.c:1763) [11932] Apache Commons Daemon procrun log initialized.
[2022-11-14 15:04:18] [info]  ( prunsrv.c:1767) [11932] Apache Commons Daemon procrun (1.2.2.0 32-bit) started.
[2022-11-14 15:04:18] [info]  ( prunsrv.c:1677) [11932] Running Service 'defaultServer'...
[2022-11-14 15:04:18] [debug] ( prunsrv.c:1448) [ 6136] Inside ServiceMain...
[2022-11-14 15:04:18] [debug] ( prunsrv.c:910 ) [ 6136] reportServiceStatusE: dwCurrentState = 2, dwWin32ExitCode = 0, dwWaitHint = 3000, dwServiceSpecificExitCode = 0.
[2022-11-14 15:04:18] [info]  ( prunsrv.c:1203) [ 6136] Starting service...
[2022-11-14 15:04:18] [debug] (rprocess.c:519 ) [ 6136] Apache Commons Daemon apxProcessExecute()
[2022-11-14 15:04:18] [debug] (rprocess.c:258 ) [ 6136] Apache Commons Daemon procrun __apxProcCreateChildPipes()
[2022-11-14 15:04:18] [debug] (rprocess.c:535 ) [ 6136] Apache Commons Daemon AplZeroMemory()
[2022-11-14 15:04:18] [debug] (rprocess.c:550 ) [ 6136] Apache Commons Daemon GetEnvironmentStringsW()
[2022-11-14 15:04:18] [debug] (rprocess.c:572 ) [ 6136] Apache Commons Daemon CreateProcessW()
[2022-11-14 15:04:18] [debug] (rprocess.c:594 ) [ 6136] Apache Commons Daemon CreateThread()
[2022-11-14 15:04:18] [debug] (rprocess.c:597 ) [ 6136] Apache Commons Daemon CreateThread()
[2022-11-14 15:04:18] [debug] (rprocess.c:600 ) [ 6136] Apache Commons Daemon ResumeThread()
[2022-11-14 15:04:19] [debug] (rprocess.c:602 ) [ 6136] Apache Commons Daemon CreateThread()
[2022-11-14 15:04:19] [debug] (rprocess.c:608 ) [ 6136] Apache Commons Daemon apxProcessExecute() returning TRUE
[2022-11-14 15:04:19] [info]  ( prunsrv.c:1359) [ 6136] Service started in 249 milliseconds.
[2022-11-14 15:04:19] [debug] ( prunsrv.c:910 ) [ 6136] reportServiceStatusE: dwCurrentState = 4, dwWin32ExitCode = 0, dwWaitHint = 0, dwServiceSpecificExitCode = 0.
[2022-11-14 15:04:19] [debug] ( prunsrv.c:1603) [ 6136] Waiting for worker to finish...
[2022-11-14 15:04:22] [debug] ( prunsrv.c:910 ) [11932] reportServiceStatusE: dwCurrentState = 3, dwWin32ExitCode = 0, dwWaitHint = 3000, dwServiceSpecificExitCode = 0.
[2022-11-14 15:04:22] [info]  ( prunsrv.c:1010) [ 7488] Stopping service...
[2022-11-14 15:04:22] [debug] (rprocess.c:519 ) [ 7488] Apache Commons Daemon apxProcessExecute()
[2022-11-14 15:04:22] [debug] (rprocess.c:258 ) [ 7488] Apache Commons Daemon procrun __apxProcCreateChildPipes()
[2022-11-14 15:04:22] [debug] (rprocess.c:535 ) [ 7488] Apache Commons Daemon AplZeroMemory()
[2022-11-14 15:04:22] [debug] (rprocess.c:550 ) [ 7488] Apache Commons Daemon GetEnvironmentStringsW()
[2022-11-14 15:04:22] [debug] (rprocess.c:572 ) [ 7488] Apache Commons Daemon CreateProcessW()
[2022-11-14 15:04:22] [debug] (rprocess.c:594 ) [ 7488] Apache Commons Daemon CreateThread()
[2022-11-14 15:04:22] [debug] (rprocess.c:597 ) [ 7488] Apache Commons Daemon CreateThread()
[2022-11-14 15:04:22] [debug] (rprocess.c:600 ) [ 7488] Apache Commons Daemon ResumeThread()
[2022-11-14 15:04:22] [debug] (rprocess.c:602 ) [ 7488] Apache Commons Daemon CreateThread()
[2022-11-14 15:04:22] [debug] (rprocess.c:608 ) [ 7488] Apache Commons Daemon apxProcessExecute() returning TRUE
[2022-11-14 15:04:22] [debug] ( prunsrv.c:1131) [ 7488] Waiting for stop worker to finish...
[2022-11-14 15:04:24] [debug] (rprocess.c:122 ) [ 4872] Child process exit code 21
[2022-11-14 15:04:24] [debug] ( prunsrv.c:1133) [ 7488] Stop worker finished.
[2022-11-14 15:04:24] [debug] ( prunsrv.c:910 ) [ 7488] reportServiceStatusE: dwCurrentState = 3, dwWin32ExitCode = 0, dwWaitHint = 300000, dwServiceSpecificExitCode = 0.
[2022-11-14 15:04:24] [debug] ( prunsrv.c:1167) [ 7488] Waiting for worker to die naturally...
[2022-11-14 15:04:26] [debug] (rprocess.c:122 ) [ 4136] Child process exit code 0 {code}
 

I think where it says *Service started in 249 milliseconds* the service appears to be started in the Windows Services panel (guessing).  But then I see "Waiting for worker to finish...".   I assume that means it is still waiting for server.bat to finish, but it has already reported that the service is running?

Ultimately the service stop fails because the server.bat cannot communicate with the liberty server, because the liberty server isn't up yet.  

> Prunsrv fails to stop Open Liberty Windows service
> --------------------------------------------------
>
>                 Key: DAEMON-449
>                 URL: https://issues.apache.org/jira/browse/DAEMON-449
>             Project: Commons Daemon
>          Issue Type: Bug
>          Components: prunsrv
>    Affects Versions: 1.2.2
>         Environment: I don't think the Windows version matters, but I am using:
> Microsoft Windows [Version 10.0.19044.2006]
>            Reporter: Jim Blye
>            Priority: Major
>         Attachments: commons-daemon.2022-09-28.log, commons-daemon.2022-09-28_afterKillingServer.log, server.bat
>
>
> When Prunsrv is used to register Open Liberty ([https://openliberty.io|https://openliberty.io/]) as a Windows Service, stopping the service hangs randomly.
> *To recreate the problem:*
>   1) Download open liberty  ~300MB
>            [https://public.dhe.ibm.com/ibmdl/export/pub/software/openliberty/runtime/release/22.0.0.10/openliberty-22.0.0.10.zip]
>       Alternatively, you could use just the liberty kernel ~10MB
>           [https://public.dhe.ibm.com/ibmdl/export/pub/software/openliberty/runtime/release/22.0.0.10/openliberty-kernel-22.0.0.10.zip]
>   2) md C:\temp\openliberty
>        unzip openliberty-22.0.0.10.zip into C:\temp\openliberty
>   3) cd C:\temp\openliberty\wlp\bin
>   4) Run these two commands to make sure the server runs and to initialize the environment:
>        server start
>        server stop
>    5) Register liberty as a service (parameter is case sensitive).  The following command is used to invoke prunsrv to register the service using the default name of defaultServer
>         server registerWinService
>     6)  Open the windows "services" app (services.msc).   Start and stop the service until the stop hangs.  It usually doesn't take more than a couple of tries.
>  
> *Details*
> The *server.bat* script registers the service like this:
> "C:\temp\openliberty\wlp\bin\tools\win\prunsrv.exe"  //IS//defaultServer --Startup=manual --DisplayName="defaultServer" --Description="Open Liberty" ++DependsOn=Tcpip --LogPath="C:\temp\openliberty\wlp\usr\servers\defaultServer\logs" --StdOutput=auto --StdError=auto --StartMode=exe --StartPath="C:\temp\openliberty\wlp" --StartImage="C:\temp\openliberty\wlp\bin\server.bat" ++StartParams=start#defaultServer --StopMode=exe --StopPath="C:\temp\openliberty\wlp" --StopImage="C:\temp\wlp\bin\server.bat" ++StopParams=stop#defaultServer --ServiceUser=LocalSystem
> After registering the service you can look at its properties.  You will see:
>    Path to executable:
>    C:\temp\openliberty\wlp\bin\tools\win\prunsrv.exe //RS//defaultServer
> When I (and many of our customers) try to stop the service, it hangs.   At that point, the server, named defaultServer is still up and running.  I can send commands to the server.  For instance, I can execute a "server dump" command. This sends the "dump" command to the server, and the server writes information to a file.  I can also stop the server by executing "server stop".  This sends the "stop" command to the server which causes the server to stop.  When the server stops, the hanging Windows service stops as well.
> I have attached a *server.bat* which adds --LogLevel Debug to the prunsrv commands and also *commons-daemon.2022-09-28.log* which demonstrates the problem. It shows registering, starting, and stopping the service.  Finally, I uploaded {*}commons-daemon.2022-09-28_afterKillingServer.log{*}, which is a continuation of the other log file which shows the result of issuing the server stop command to stop the server.
> Thanks for looking at this.
> Jim Blye



--
This message was sent by Atlassian Jira
(v8.20.10#820010)