You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Vinod Kone (JIRA)" <ji...@apache.org> on 2012/06/23 04:18:42 UTC
[jira] [Created] (MESOS-218) Master throws exception on
removeTask() if Framework is not connected
Vinod Kone created MESOS-218:
--------------------------------
Summary: Master throws exception on removeTask() if Framework is not connected
Key: MESOS-218
URL: https://issues.apache.org/jira/browse/MESOS-218
Project: Mesos
Issue Type: Bug
Reporter: Vinod Kone
When a slave is disconnected from the master, the master removes all tasks belonging to that slave.
If a framework is disconnected during this period, removeTask() throws an exception. This can result in LOST tasks not being reported to the scheduler. This is bad because framework now thinks the task is running, but the executor doesnt think so. But the TASK_KILLED messages from executor are dropped by slave, because the (restarted) slave has no idea about the task.
I0623 00:58:36.758640 28346 master.cpp:1694] Adding slave 201206230058-1937777162-5050-28332-0 at smf1-afg-23-sr3.prod.twitter.com with cpus=14; mem=22528; ports=[31000-32000]; disk=400000
I0623 00:58:36.758826 28346 simple_allocator.cpp:69] Added slave 201206230058-1937777162-5050-28332-0 with cpus=14; mem=22528; ports=[31000-32000]; disk=400
000
I0623 00:58:36.761170 28344 master.cpp:941] Attempting to register slave on smf1-aff-31-sr4.prod.twitter.com at slave(1)@10.34.135.131:5051
I0623 00:58:36.761245 28344 master.cpp:1158] Master now considering a slave at smf1-aff-31-sr4.prod.twitter.com:5051 as active
I0623 00:58:36.761275 28344 master.cpp:1694] Adding slave 201206230058-1937777162-5050-28332-1 at smf1-aff-31-sr4.prod.twitter.com with cpus=14; mem=22528;
ports=[31000-32000]; disk=400000
I0623 00:58:36.761489 28344 simple_allocator.cpp:69] Added slave 201206230058-1937777162-5050-28332-1 with cpus=14; mem=22528; ports=[31000-32000]; disk=400
000
2012-06-23 00:58:39,871:28332(0x4955b940):ZOO_DEBUG@zookeeper_process@1983: Got ping response in 0 ms
I0623 00:58:39.910228 28342 master.cpp:70] Watching path file:///usr/local/mesos/conf/whitelist.txt
I0623 00:58:39.910339 28342 master.cpp:98] Whitelisting slave smf1-afg-23-sr3.prod.twitter.com
I0623 00:58:39.910395 28342 master.cpp:98] Whitelisting slave smf1-aff-31-sr4.prod.twitter.com
2012-06-23 00:58:43,208:28332(0x4955b940):ZOO_DEBUG@zookeeper_process@1983: Got ping response in 0 ms
I0623 00:58:44.911403 28346 master.cpp:70] Watching path file:///usr/local/mesos/conf/whitelist.txt
I0623 00:58:44.911511 28346 master.cpp:98] Whitelisting slave smf1-afg-23-sr3.prod.twitter.com
I0623 00:58:44.911541 28346 master.cpp:98] Whitelisting slave smf1-aff-31-sr4.prod.twitter.com
2012-06-23 00:58:46,545:28332(0x4955b940):ZOO_DEBUG@zookeeper_process@1983: Got ping response in 0 ms
I0623 00:58:49.738129 28345 master.cpp:548] Slave 201206160031-1937777162-5050-11967-3 disconnected
F0623 00:58:49.738231 28345 master.cpp:1880] Check failed: framework != NULL
*** Check failure stack trace: ***
@ 0x7f032d18e3fd google::LogMessage::Fail()
@ 0x7f032d194067 google::LogMessage::SendToLog()
@ 0x7f032d18fcac google::LogMessage::Flush()
@ 0x7f032d18ff16 google::LogMessageFatal::~LogMessageFatal()
@ 0x7f032cedb462 mesos::internal::master::Master::removeTask()
@ 0x7f032cee58d6 mesos::internal::master::Master::removeSlave()
@ 0x7f032cee7b6e mesos::internal::master::Master::exited()
@ 0x7f032d0ac3f2 process::ProcessBase::visit()
@ 0x7f032d0be4f6 process::ExitedEvent::visit()
@ 0x7f032d0b7054 process::ProcessManager::resume()
@ 0x7f032d0b78a7 process::schedule()
@ 0x7f032c5f573d start_thread
@ 0x7f032bbdff6d clone
Bottle server starting up (using WSGIRefServer())...
Listening on http://0.0.0.0:8080/
Use Ctrl-C to quit.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira