You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ambari.apache.org by "Nahappan Somasundaram (JIRA)" <ji...@apache.org> on 2015/09/04 02:37:46 UTC
[jira] [Created] (AMBARI-13007) Stopping ambari-server may kill
ambari-agent running on the same machine in some cases
Nahappan Somasundaram created AMBARI-13007:
----------------------------------------------
Summary: Stopping ambari-server may kill ambari-agent running on the same machine in some cases
Key: AMBARI-13007
URL: https://issues.apache.org/jira/browse/AMBARI-13007
Project: Ambari
Issue Type: Bug
Components: ambari-server
Affects Versions: 2.2.0
Reporter: Nahappan Somasundaram
Assignee: Nahappan Somasundaram
Fix For: 2.2.0
Launch multinode Ambari clusters using a simple python script. It logs in to every node via ssh and runs a shell script:
{code}
#!/usr/bin/env bash
while [[ $# > 0 ]]
do
key="$1"
case ${key} in
--server)
ASERVER="$2" # Server hostname
shift # past argument
;;
--noserver)
NOSERVER="NOSERVER" # Don't install/start server
;;
*)
echo unknown option
exit 1
;;
esac
shift # past argument or value
done
yum clean all
curl http://s3.amazonaws.com/dev.hortonworks.com/ambari/centos6/2.x/latest/trunk/ambaribn.repo > /etc/yum.repos.d/ambari.repo
# server
if [ "${ASERVER}" = $(hostname -f) ] && [ -z "${NOSERVER}" ] ; then
yum install sudo postgresql-server wget -y
rpm -i /tmp/rpms/ambari-server*.rpm
# Disable iptables
iptables -F
ambari-server setup -s
# Enable remote debug
sed -rie 's/-server -XX:NewRatio/-server -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005 -XX:NewRatio/g' /usr/sbin/ambari_server_main.py
## Sleep until debugger connects
# sed -rie 's/dt_socket,server=y,suspend=.,address=5005/dt_socket,server=y,suspend=y,address=5005/g' /usr/sbin/ambari-server.py
# Fix an issue with UI client version
gunzip /usr/lib/ambari-server/web/javascripts/app.js.gz
amb=$(ambari-server --version); sed -i "s/App\.version = '';/App.version = '$amb';/" /usr/lib/ambari-server/web/javascripts/app.js
gzip /usr/lib/ambari-server/web/javascripts/app.js
# Increase task timeout
sed -ri 's/agent.package.install.task.timeout=1800/agent.package.install.task.timeout=3600/g' /etc/ambari-server/conf/ambari.properties
find /var/lib/ambari-server/resources/ -name metainfo.xml | xargs -L 1 sed -ri 's/<timeout>[[:digit:]]+[[:digit:]]*<\//<timeout>1800<\//g'
# Start the server
ambari-server start -v || exit 1
fi
# Agent
iptables -F
yum clean all
yum install -y wget
rpm -i /tmp/rpms/ambari-agent*.rpm
# Replace server hostname
sed -rie "s/hostname=localhost/hostname=$ASERVER/g" /etc/ambari-agent/conf/ambari-agent.ini
# Enable debug mode at agent
# sed -rie 's/=INFO/=DEBUG/g' /etc/ambari-agent/conf/ambari-agent.ini
ambari-agent start || exit 1
{code}
When I restart ambari-server, agent running on the same node is killed with 100% probability. That is because it is launched in the same process group with ambari-server, and ambari-server kills everything that belongs to it's process group. I assume that this situation is common for launching ambari-server and ambari-agent from the same shell script via ssh, or maybe also via configuration management tools like puppet/chef/etc. (did not check this assumption).
*More info:*
{code}
[root@dlysnichenko-ru3-1 ~]# ps -ejH
PID PGID SID TTY TIME CMD
1584 1584 1584 ? 00:00:00 sshd
2659 2659 2659 ? 00:00:00 sshd
2662 2662 2662 pts/0 00:00:00 bash
3268 3268 2662 pts/0 00:00:00 ps
2056 2041 2041 ? 00:00:00 postmaster
2058 2058 2058 ? 00:00:00 postmaster
2060 2060 2060 ? 00:00:00 postmaster
2061 2061 2061 ? 00:00:00 postmaster
2062 2062 2062 ? 00:00:00 postmaster
2063 2063 2063 ? 00:00:00 postmaster
2380 2380 2380 ? 00:00:00 postmaster
2397 2397 2397 ? 00:00:00 postmaster
2649 2649 2649 ? 00:00:01 postmaster
2654 2654 2654 ? 00:00:00 postmaster
2655 2655 2655 ? 00:00:00 postmaster
2656 2656 2656 ? 00:00:00 postmaster
2360 1644 1644 ? 00:00:59 java
2507 1644 1644 ? 00:00:00 python2.6
2515 1644 1644 ? 00:00:01 python2.6
3230 3230 3230 ? 00:00:00 anacron
[root@dlysnichenko-ru3-1 ~]# ambari-agent status
Found ambari-agent PID: 2515
ambari-agent running.
Agent PID at: /var/run/ambari-agent/ambari-agent.pid
Agent out at: /var/log/ambari-agent/ambari-agent.out
Agent log at: /var/log/ambari-agent/ambari-agent.log
[root@dlysnichenko-ru3-1 ~]# ambari-server stop
Using python /usr/bin/python2.6
Stopping ambari-server
Ambari Server stopped
[root@dlysnichenko-ru3-1 ~]# ambari-agent status
Found ambari-agent PID: 2515
ambari-agent not running. Stale PID File at: /var/run/ambari-agent/ambari-agent.pid
[root@dlysnichenko-ru3-1 ~]#
{code}
Note: both agent and server share the same process group 1644. We should not kill process group when stopping ambari-server, or we should create a dedicated process group when launching it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)