You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ambari.apache.org by "Daniel Horak (JIRA)" <ji...@apache.org> on 2015/03/03 14:12:04 UTC
[jira] [Created] (AMBARI-9893) Ambari services should be properly
daemonized
Daniel Horak created AMBARI-9893:
------------------------------------
Summary: Ambari services should be properly daemonized
Key: AMBARI-9893
URL: https://issues.apache.org/jira/browse/AMBARI-9893
Project: Ambari
Issue Type: Bug
Components: ambari-agent, ambari-server
Affects Versions: 1.6.1
Environment: HDP 2.1 on RHEL 6
ambari-server-1.6.1-98.noarch
ambari-agent-1.6.1-98.x86_64
Reporter: Daniel Horak
Priority: Critical
Ambari services (_ambari-server_ and _ambari-agent_) are not properly demonized.
When any service start as daemon, it should _become a process group leader_ ([apart from other requirements|https://en.wikipedia.org/wiki/Daemon_%28computing%29]).
h3. How to reproduce
1) Prepare simple test shell script:
{noformat}
# cat test-ambari-server.sh
#!/bin/bash -x
ambari-server restart
sleep 10
ambari-server restart
sleep 10
date
# chmod +x test-ambari-server.sh
{noformat}
This script should restart ambari-server two times (with some delay) and then
print date.
2) Run the test script.
The script doesn't behave as expected: the second _ambari-server restart_ kills
the whole script! See:
{noformat}
# ./test-ambari-server.sh
+ ambari-server restart
Using python /usr/bin/python2.6
Restarting ambari-server
Using python /usr/bin/python2.6
Stopping ambari-server
Ambari Server stopped
Using python /usr/bin/python2.6
Starting ambari-server
Ambari Server running with 'root' privileges.
Organizing resource files at /var/lib/ambari-server/resources...
Waiting for server start...
Server PID at: /var/run/ambari-server/ambari-server.pid
Server out at: /var/log/ambari-server/ambari-server.out
Server log at: /var/log/ambari-server/ambari-server.log
Ambari Server 'start' completed successfully.
+ sleep 10
+ ambari-server restart
Using python /usr/bin/python2.6
Restarting ambari-server
Using python /usr/bin/python2.6
Stopping ambari-server
Killed
# echo $?
137
{noformat}
h3. Explanation
After the first {{ambari-server restart}} the _process group ID_ (_PGID_) of
ambari-server is the same as the _PGID_ of the test shell script. In other words
ambari-server belongs to the same process group as the test script
because ambari-server haven't became the _process group leader_.
Then 2nd {{ambari-server restart}} calls {{stop()}} function from
{{/usr/sbin/ambari-server.py}} and this function kills all processes in the same
process group as ambari-server (code {{os.killpg(os.getpgid(pid), signal.SIGKILL)}}, where {{pid}} is the pid of running ambari-server process).
There is nothing wrong with this assuming the ambari service daemon process
creates new process group for itself - which is not the case (and root cause of
the bug).
h3. Deeper debugging
You can check the PGIDs via the ps command: {{ps -e --forest -o pgrp,args}}.
You can also add following lines to the {{test-ambari-server.sh}} script after
the first {{ambari-server restart}} command:
{noformat}
echo "shell pid: $$"
ps -o pid,ppid,pgrp -p $(cat /var/run/ambari-server/ambari-server.pid)
{noformat}
So that when you run the {{test-ambari-server.sh}} script again, you would be
able to see that the ambari-server process belongs to the process group of the
shell (PGRP aka PGID of the shell is the same as it's PID in this case):
{noformat}
+ echo 'shell pid: 9368'
shell pid: 9368
++ cat /var/run/ambari-server/ambari-server.pid
+ ps -o pid,ppid,pgrp -p 9415
PID PPID PGRP
9415 1 9368
{noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)