You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@myriad.apache.org by "Sarjeet Singh (JIRA)" <ji...@apache.org> on 2015/09/09 02:09:46 UTC
[jira] [Created] (MYRIAD-133) Multiple flexed up NMs try to run on
same node, altogether.
Sarjeet Singh created MYRIAD-133:
------------------------------------
Summary: Multiple flexed up NMs try to run on same node, altogether.
Key: MYRIAD-133
URL: https://issues.apache.org/jira/browse/MYRIAD-133
Project: Myriad
Issue Type: Bug
Components: Scheduler
Affects Versions: Myriad 0.1.0
Reporter: Sarjeet Singh
On a 3 node cluster with latest build running with NM +Executor merge, I am seeing issue with flexing up
Multiple instances of NMs that multiple NMs try to start on same node at same
time altogether.
Here is the existing/Already running tasks from Myriad: (Before multiple NM
flex up)
[root@qa101-137 ~]# curl -s http://testrm.marathon.mesos:8192/api/state
{"pendingTasks":[],
"stagingTasks":[],
"activeTasks":[
"jobhistory.jobhistory.a25e35c8-c551-498a-81ff-5b29389064c7",
"nm.medium.a8a36268-e365-4fd2-a87c-4c02ac2aeb89",
"nm.small.30e0ce9c-f9da-49de-b927-ab8a58be6d52"],
"killableTasks":[]}
Then, I tried flexing up 4 instances of Zero-profile NM, Keep note that only 1
Node is without any NM, other 2 nodes already running NMs (See above).
here is the task status from myriad just after flex up and when all NMs were in
active state.
[root@qa101-137 ~]# curl -H "Content-Type: application/json" -X PUT -d
'{"instances":4, "profile":"zero"}'
http://testrm.marathon.mesos:8192/api/cluster/flexup
[root@qa101-137 ~]# curl -s http://testrm.marathon.mesos:8192/api/state |
python -mjson.tool
{
"activeTasks": [
"jobhistory.jobhistory.a25e35c8-c551-498a-81ff-5b29389064c7",
"nm.medium.a8a36268-e365-4fd2-a87c-4c02ac2aeb89",
"nm.small.30e0ce9c-f9da-49de-b927-ab8a58be6d52"
],
"killableTasks": [],
"pendingTasks": [
"nm.zero.cd35db39-30f0-4da5-aa07-67c22cfe40ee",
"nm.zero.ad7d597c-27f8-4e2c-8108-ae675990fdd9",
"nm.zero.5110931a-279e-4f95-b4e6-5d1167d45993"
],
"stagingTasks": [
"nm.zero.a5e73358-351f-4938-ba3d-9dc759b514e0"
]
}
[root@qa101-137 ~]# curl -s http://testrm.marathon.mesos:8192/api/state |
python -mjson.tool
{
"activeTasks": [
"jobhistory.jobhistory.a25e35c8-c551-498a-81ff-5b29389064c7",
"nm.zero.a5e73358-351f-4938-ba3d-9dc759b514e0",
"nm.medium.a8a36268-e365-4fd2-a87c-4c02ac2aeb89",
"nm.small.30e0ce9c-f9da-49de-b927-ab8a58be6d52",
"nm.zero.cd35db39-30f0-4da5-aa07-67c22cfe40ee",
"nm.zero.ad7d597c-27f8-4e2c-8108-ae675990fdd9",
"nm.zero.5110931a-279e-4f95-b4e6-5d1167d45993"
],
"killableTasks": [],
"pendingTasks": [],
"stagingTasks": []
}
On Mesos, all 4 NMs tries to start on a single node, and they all in RUNNING
state at some point, and then moved to LOST state after all NMs settled down.
Also, Myriad moved the rest of NON-Successful tasks from active to pending
state later on.
[root@qa101-137 ~]# curl -s http://testrm.marathon.mesos:8192/api/state |
python -mjson.tool
{
"activeTasks": [
"jobhistory.jobhistory.a25e35c8-c551-498a-81ff-5b29389064c7",
"nm.zero.a5e73358-351f-4938-ba3d-9dc759b514e0",
"nm.medium.a8a36268-e365-4fd2-a87c-4c02ac2aeb89",
"nm.small.30e0ce9c-f9da-49de-b927-ab8a58be6d52"
],
"killableTasks": [],
"pendingTasks": [
"nm.zero.cd35db39-30f0-4da5-aa07-67c22cfe40ee",
"nm.zero.ad7d597c-27f8-4e2c-8108-ae675990fdd9",
"nm.zero.5110931a-279e-4f95-b4e6-5d1167d45993"
],
"stagingTasks": []
}
Let me know if need any additional details regarding the issue?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)