You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Klaus Malorny (Jira)" <ji...@apache.org> on 2023/04/27 13:32:00 UTC

[jira] [Comment Edited] (DAEMON-459) Restart only works once (regression)

    [ https://issues.apache.org/jira/browse/DAEMON-459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17717230#comment-17717230 ] 

Klaus Malorny edited comment on DAEMON-459 at 4/27/23 1:31 PM:
---------------------------------------------------------------

Hi Mark,

you cannot trigger the problem by sending signals from outside to the process. The problem is that the code tries to send a signal to itself, but fails to do so as the variable allegedly holding the own process ID simply does not. The first time it works as the variable contains 0, which sends the signal to all processes of the process group (in which the child process is contained). In the second attempt, it contains the process ID of the previous child, which no longer exists. I guess that the kill function returns an error, but it is not checked.

You can alter the {{main_reload}} function in line 1408 (et seqq.) and print out the {{controlled}} variable, and, if you like, print the result of the kill to see what's happening. You need to trigger the call of this function from within the application, i.e.\{{ DaemonController.reload () }}method in Java. I don't know whether Tomcat can be somehow prompted to do so.


was (Author: JIRAUSER299952):
Hi Mark,

you cannot trigger the problem by sending signals from outside to the process. The problem is that the code tries to send a kill to itself, but fails to do so as the variable allegedly holding the own process ID simply does not. The first time it works as the variable contains 0, which sends the signal to all processes of the process group (in which the child process is contained). In the second attempt, it contains the process ID of the previous child, which no longer exists. I guess that the kill function returns an error, but it is not checked.

You can alter the {{main_reload}} function in line 1408 (et seqq.) and print out the {{controlled}} variable, and, if you like, print the result of the kill to see what's happening. You need to trigger the call of this function from within the application, i.e.{{ DaemonController.reload () }}method in Java. I don't know whether Tomcat can be somehow prompted to do so.

> Restart only works once (regression)
> ------------------------------------
>
>                 Key: DAEMON-459
>                 URL: https://issues.apache.org/jira/browse/DAEMON-459
>             Project: Commons Daemon
>          Issue Type: Bug
>          Components: Jsvc
>    Affects Versions: 1.3.3
>            Reporter: Klaus Malorny
>            Priority: Major
>
> For certain functions, especially code updates, we rely on the ability to restart the child process. This seems to work only once. On the subsequent attempt, the child process hangs.
> I tracked down the problem and found out that the problem is within the {{jsvc-unix.c}} file. The {{main_reload}} function is called to send the signal to itself, but this does not happen. In the first restart, the {{controlled}} variable holds the value of 0. This works by chance, as the signal is sent to the parent, which sends it back to the child. In the second attempt, the variable holds the PID of the previous child, thus the signal is sent to a no longer existing process.
> The {{controlled}} variable is used both by the parent and the child process. In earlier versions of the file, the child process determines its own PID by using the {{getpid}} system function. This call has been – likely accidentally – removed in version 1.3.3 or earlier. Thus, the variable contains the parent's value before the fork which has created the child.
> The solution is simple: in the function {{{}child{}}}, add
> {{    controlled = getpid ();}}
> between the {{sigaction}} calls and the {{log_debug ("Waiting for a signal to be delivered")}} call (line 913 in my copy of the file), i.e.
> {{    ...}}
> {{    memset(&act, '\0', sizeof(act));}}
> {{    act.sa_handler = handler;}}
> {{    sigemptyset(&act.sa_mask);}}
> {{    act.sa_flags = SA_RESTART | SA_NOCLDSTOP;}}
> {{    sigaction(SIGHUP, &act, NULL);}}
> {{    sigaction(SIGUSR1, &act, NULL);}}
> {{    sigaction(SIGUSR2, &act, NULL);}}
> {{    sigaction(SIGTERM, &act, NULL);}}
> {{    sigaction(SIGINT, &act, NULL);}}
> {{    *controlled = getpid ();*}}
> {{    log_debug("Waiting for a signal to be delivered");}}
> {{    create_tmp_file(args);}}
> {{    while (!stopping) {}}
> {{    ...}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)