You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ambari.apache.org by "Dmitry Lysnichenko (JIRA)" <ji...@apache.org> on 2014/11/06 17:22:33 UTC

[jira] [Created] (AMBARI-8185) Services fail to start when pid file is empty

Dmitry Lysnichenko created AMBARI-8185:
------------------------------------------

             Summary: Services fail to start when pid file is empty
                 Key: AMBARI-8185
                 URL: https://issues.apache.org/jira/browse/AMBARI-8185
             Project: Ambari
          Issue Type: Bug
          Components: ambari-server
    Affects Versions: 1.6.1
            Reporter: Dmitry Lysnichenko
            Assignee: Dmitry Lysnichenko
             Fix For: 2.0.0


Witnessed at a customer site:
* Storm Supervisor server had a pid file at {{/var/run/storm/supervisor.pid}}
* This file, while present, had no content
* The stack file, {{service.py}} detects a running process using this call:
{noformat}
  no_op_test = format("ls {pid_file} >/dev/null 2>&1 && ps `cat {pid_file}` >/dev/null 2>&1")
{noformat}
* When the file is empty, this command returns 0 (success), and the startup command does not run.
* Changed the command to
{noformat}
  no_op_test = format("ls {pid_file} >/dev/null 2>&1 && ps -p `cat {pid_file}` >/dev/null 2>&1")
{noformat}
which returns properly that the process is not running and startup can continue.

The customer reports that they have seen this behavior with other services, but could not reproduce on-site.  This pattern is used frequently through the code base and should be addressed for all services including Storm.  Validation of this change is the critical task here since the change is "small" - the effects are large in scope.

Also, at ambari/ambari-agent/conf/unix/ambari-agent we have few invocations of a similar code with another bug:
{code}
          PID=`cat $PIDFILE`
          echo "Found $AMBARI_AGENT PID: $PID"
          if [ -z "`ps ax -o pid | grep $PID`" ]; then
{code}
Here if $PID is for example 2111 and there is a running process with pid like 22111, we will get a false positive (agent will refuse to start saying it is already running).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)