You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Neil Conway (JIRA)" <ji...@apache.org> on 2015/11/17 07:48:11 UTC
[jira] [Commented] (MESOS-3719) Core dump on /teardown
[ https://issues.apache.org/jira/browse/MESOS-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15008152#comment-15008152 ]
Neil Conway commented on MESOS-3719:
------------------------------------
[~kensipe] Hi Ken -- any progress getting the steps to reproduce this crash?
I notice that MESOS-3744 lists the assertion failure as:
{{sorter.cpp:213] Check failed: total.resources.contains(slaveId)}}
> Core dump on /teardown
> ----------------------
>
> Key: MESOS-3719
> URL: https://issues.apache.org/jira/browse/MESOS-3719
> Project: Mesos
> Issue Type: Bug
> Components: master
> Affects Versions: 0.24.1
> Reporter: Ken Sipe
>
> invoked a `/master/teardown` for 2 frameworks. sample invocation (on the master node using mesos-dns) is:
> {code}curl -d "frameworkId=20151013-143739-1510211594-5050-1515-0002" -X POST http://master.mesos:5050/master/teardown{code}
> logs at the master:
> {code}
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: I1013 15:36:24.677880 1525 http.cpp:321] HTTP POST for /master/teardown from 10.0.4.90:53789 with User-Agent='curl/7.42.1'
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: I1013 15:36:24.679015 1525 master.cpp:5112] Removing framework 20151013-143739-1510211594-5050-1515-0002 (hdfs) at scheduler-a5388720-fcbf-4fd0-b01e-75712b12c99d@10.0.4.90:53903
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: I1013 15:36:24.679280 1525 master.cpp:5576] Updating the latest state of task task.journalnode.journalnode.NodeExecutor.1444747955695 of framework 20151013-143739-1510211594-5050-1515-0002 to TASK_KILLED
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: I1013 15:36:24.679385 1525 master.cpp:5644] Removing task task.journalnode.journalnode.NodeExecutor.1444747955695 with resources cpus(*):0.25; mem(*):691.2 of framework 20151013-143739-1510211594-5050-1515-0002 on
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: I1013 15:36:24.679404 1521 hierarchical.hpp:814] Recovered cpus(*):0.25; mem(*):691.2 (total: ports(*):[1025-2180, 2182-3887, 3889-5049, 5052-8079, 8082-8180, 8182-32000]; cpus(*):4; mem(*):14019; disk(*):32541, a
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: I1013 15:36:24.679536 1525 master.cpp:5673] Removing executor 'executor.journalnode.NodeExecutor.1444747955695' with resources cpus(*):0.1; mem(*):345.6 of framework 20151013-143739-1510211594-5050-1515-0002 on sl
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: I1013 15:36:24.679641 1521 hierarchical.hpp:814] Recovered cpus(*):0.1; mem(*):345.6 (total: ports(*):[1025-2180, 2182-3887, 3889-5049, 5052-8079, 8082-8180, 8182-32000]; cpus(*):4; mem(*):14019; disk(*):32541, al
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: F1013 15:36:24.679719 1521 sorter.cpp:213] Check failed: total.resources.contains(slaveId)
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: *** Check failure stack trace: ***
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 0x7fba6d86c9fd google::LogMessage::Fail()
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 0x7fba6d86e89d google::LogMessage::SendToLog()
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 0x7fba6d86c5ec google::LogMessage::Flush()
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 0x7fba6d86f1be google::LogMessageFatal::~LogMessageFatal()
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 0x7fba6e3f2910 mesos::internal::master::allocator::DRFSorter::remove()
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 0x7fba6e2cc0bc mesos::internal::master::allocator::HierarchicalAllocatorProcess<>::removeFramework()
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 0x7fba6e977551 process::ProcessManager::resume()
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 0x7fba6e97784f process::internal::schedule()
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: I1013 15:36:24.684733 1520 http.cpp:321] HTTP GET for /master/state-summary from 10.0.4.90:53790 with User-Agent='Python-urllib/3.4'
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 0x7fba6d30ebc3 (unknown)
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 0x7fba6cb1266c (unknown)
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 0x7fba6c8552ed (unknown)
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: dcos-mesos-master.service: Main process exited, code=killed, status=6/ABRT
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: dcos-mesos-master.service: Unit entered failed state.
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: dcos-mesos-master.service: Failed with result 'signal'.
> Oct 13 15:36:40 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: dcos-mesos-master.service: Service hold-off time over, scheduling restart.
> Oct 13 15:36:40 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: Starting Mesos Master...
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)