You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ambari.apache.org by "Jayush Luniya (JIRA)" <ji...@apache.org> on 2015/09/11 22:09:45 UTC
[jira] [Commented] (AMBARI-13007) Stopping ambari-server may kill
ambari-agent running on the same machine in some cases
[ https://issues.apache.org/jira/browse/AMBARI-13007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741469#comment-14741469 ]
Jayush Luniya commented on AMBARI-13007:
----------------------------------------
commit 02fcea6c3c38f2fff4e73bdc3d235cf87477d4e2
Author: Jayush Luniya <jl...@hortonworks.com>
Date: Fri Sep 11 13:08:25 2015 -0700
AMBARI-13007: Stopping ambari-server may kill ambari-agent running on the same machine in some cases (Nahappan Somasundaram via jluniya)
> Stopping ambari-server may kill ambari-agent running on the same machine in some cases
> --------------------------------------------------------------------------------------
>
> Key: AMBARI-13007
> URL: https://issues.apache.org/jira/browse/AMBARI-13007
> Project: Ambari
> Issue Type: Bug
> Components: ambari-server
> Affects Versions: 2.2.0
> Reporter: Nahappan Somasundaram
> Assignee: Nahappan Somasundaram
> Fix For: 2.2.0
>
>
> Launch multinode Ambari clusters using a simple python script. It logs in to every node via ssh and runs a shell script:
> {code}
> #!/usr/bin/env bash
> while [[ $# > 0 ]]
> do
> key="$1"
> case ${key} in
> --server)
> ASERVER="$2" # Server hostname
> shift # past argument
> ;;
> --noserver)
> NOSERVER="NOSERVER" # Don't install/start server
> ;;
> *)
> echo unknown option
> exit 1
> ;;
> esac
> shift # past argument or value
> done
> yum clean all
> curl http://s3.amazonaws.com/dev.hortonworks.com/ambari/centos6/2.x/latest/trunk/ambaribn.repo > /etc/yum.repos.d/ambari.repo
> # server
> if [ "${ASERVER}" = $(hostname -f) ] && [ -z "${NOSERVER}" ] ; then
> yum install sudo postgresql-server wget -y
> rpm -i /tmp/rpms/ambari-server*.rpm
> # Disable iptables
> iptables -F
> ambari-server setup -s
> # Enable remote debug
> sed -rie 's/-server -XX:NewRatio/-server -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005 -XX:NewRatio/g' /usr/sbin/ambari_server_main.py
> ## Sleep until debugger connects
> # sed -rie 's/dt_socket,server=y,suspend=.,address=5005/dt_socket,server=y,suspend=y,address=5005/g' /usr/sbin/ambari-server.py
> # Fix an issue with UI client version
> gunzip /usr/lib/ambari-server/web/javascripts/app.js.gz
> amb=$(ambari-server --version); sed -i "s/App\.version = '';/App.version = '$amb';/" /usr/lib/ambari-server/web/javascripts/app.js
> gzip /usr/lib/ambari-server/web/javascripts/app.js
> # Increase task timeout
> sed -ri 's/agent.package.install.task.timeout=1800/agent.package.install.task.timeout=3600/g' /etc/ambari-server/conf/ambari.properties
> find /var/lib/ambari-server/resources/ -name metainfo.xml | xargs -L 1 sed -ri 's/<timeout>[[:digit:]]+[[:digit:]]*<\//<timeout>1800<\//g'
> # Start the server
> ambari-server start -v || exit 1
> fi
> # Agent
> iptables -F
> yum clean all
> yum install -y wget
> rpm -i /tmp/rpms/ambari-agent*.rpm
> # Replace server hostname
> sed -rie "s/hostname=localhost/hostname=$ASERVER/g" /etc/ambari-agent/conf/ambari-agent.ini
> # Enable debug mode at agent
> # sed -rie 's/=INFO/=DEBUG/g' /etc/ambari-agent/conf/ambari-agent.ini
> ambari-agent start || exit 1
> {code}
> When I restart ambari-server, agent running on the same node is killed with 100% probability. That is because it is launched in the same process group with ambari-server, and ambari-server kills everything that belongs to it's process group. I assume that this situation is common for launching ambari-server and ambari-agent from the same shell script via ssh, or maybe also via configuration management tools like puppet/chef/etc. (did not check this assumption).
> *More info:*
> {code}
> [root@dlysnichenko-ru3-1 ~]# ps -ejH
> PID PGID SID TTY TIME CMD
> 1584 1584 1584 ? 00:00:00 sshd
> 2659 2659 2659 ? 00:00:00 sshd
> 2662 2662 2662 pts/0 00:00:00 bash
> 3268 3268 2662 pts/0 00:00:00 ps
> 2056 2041 2041 ? 00:00:00 postmaster
> 2058 2058 2058 ? 00:00:00 postmaster
> 2060 2060 2060 ? 00:00:00 postmaster
> 2061 2061 2061 ? 00:00:00 postmaster
> 2062 2062 2062 ? 00:00:00 postmaster
> 2063 2063 2063 ? 00:00:00 postmaster
> 2380 2380 2380 ? 00:00:00 postmaster
> 2397 2397 2397 ? 00:00:00 postmaster
> 2649 2649 2649 ? 00:00:01 postmaster
> 2654 2654 2654 ? 00:00:00 postmaster
> 2655 2655 2655 ? 00:00:00 postmaster
> 2656 2656 2656 ? 00:00:00 postmaster
> 2360 1644 1644 ? 00:00:59 java
> 2507 1644 1644 ? 00:00:00 python2.6
> 2515 1644 1644 ? 00:00:01 python2.6
> 3230 3230 3230 ? 00:00:00 anacron
> [root@dlysnichenko-ru3-1 ~]# ambari-agent status
> Found ambari-agent PID: 2515
> ambari-agent running.
> Agent PID at: /var/run/ambari-agent/ambari-agent.pid
> Agent out at: /var/log/ambari-agent/ambari-agent.out
> Agent log at: /var/log/ambari-agent/ambari-agent.log
> [root@dlysnichenko-ru3-1 ~]# ambari-server stop
> Using python /usr/bin/python2.6
> Stopping ambari-server
> Ambari Server stopped
> [root@dlysnichenko-ru3-1 ~]# ambari-agent status
> Found ambari-agent PID: 2515
> ambari-agent not running. Stale PID File at: /var/run/ambari-agent/ambari-agent.pid
> [root@dlysnichenko-ru3-1 ~]#
> {code}
> Note: both agent and server share the same process group 1644. We should not kill process group when stopping ambari-server, or we should create a dedicated process group when launching it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)