You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Benjamin Mahler (JIRA)" <ji...@apache.org> on 2013/09/03 20:17:52 UTC

[jira] [Created] (MESOS-676) Slave::reregistered LOG(FATAL)s due to being in RECOVERING state.

Benjamin Mahler created MESOS-676:
-------------------------------------

             Summary: Slave::reregistered LOG(FATAL)s due to being in RECOVERING state.
                 Key: MESOS-676
                 URL: https://issues.apache.org/jira/browse/MESOS-676
             Project: Mesos
          Issue Type: Bug
            Reporter: Benjamin Mahler
            Assignee: Benjamin Mahler
             Fix For: 0.14.0


void Slave::reregistered(const SlaveID& slaveId)
{
  switch(state) {
    case DISCONNECTED:
      LOG(INFO) << "Re-registered with master " << master;

      state = RUNNING;
      if (!(info.id() == slaveId)) {
        EXIT(1) << "Re-registered but got wrong id: " << slaveId
                << "(expected: " << info.id() << "). Committing suicide";
      }
      break;
    case RUNNING:
      // Already re-registered!
      if (!(info.id() == slaveId)) {
        EXIT(1) << "Re-registered but got wrong id: " << slaveId
                << "(expected: " << info.id() << "). Committing suicide";
      }
      LOG(WARNING) << "Already re-registered with master " << master;
      break;
    case TERMINATING:
      LOG(WARNING) << "Ignoring re-registration because slave is terminating";
      break;
    case RECOVERING:
    default:
      LOG(FATAL) << "Unexpected slave state " << state;
      break;
  }
}

Saw a slave fail because of this last case statement:

F0903 02:01:26.436521 42417 slave.cpp:672] Unexpected slave state 0
*** Check failure stack trace: ***
    @     0x7f042c579d8d  google::LogMessage::Fail()
    @     0x7f042c57dd77  google::LogMessage::SendToLog()
    @     0x7f042c57c674  google::LogMessage::Flush()
    @     0x7f042c57c8a6  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f042c21db8a  mesos::internal::slave::Slave::reregistered()
    @     0x7f042c276c1d  ProtobufProcess<>::handler1<>()
    @     0x7f042c24560a  std::tr1::_Function_handler<>::_M_invoke()
    @     0x7f042c27702b  ProtobufProcess<>::visit()
    @     0x7f042c46baf4  process::ProcessManager::resume()
    @     0x7f042c46c54f  process::schedule()
    @     0x7f042bbd983d  start_thread
    @     0x7f042a5bbf8d  clone
/usr/local/bin/mesos-slave.sh: line 117: 42408 Aborted                 (core dumped) /usr/local/sbin/mesos-slave --port=5051 --resources="${MESOS_RESOURCES}" --attributes="${MESOS_ATTRIBUTES}" --master="${master_zoo_url}" --log_dir="${log_dir}" ${EXTRA_FLAGS} "$@"
Slave Exit Status: 134

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira