You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "David vonThenen (JIRA)" <ji...@apache.org> on 2016/08/17 20:54:20 UTC
[jira] [Created] (MESOS-6054) Agent Crash with Malformed UUID when
doing TaskUpdate
David vonThenen created MESOS-6054:
--------------------------------------
Summary: Agent Crash with Malformed UUID when doing TaskUpdate
Key: MESOS-6054
URL: https://issues.apache.org/jira/browse/MESOS-6054
Project: Mesos
Issue Type: Bug
Components: framework api
Affects Versions: 1.0.0
Environment: Ubuntu 14.04, Mesos 1.0.0-2.0.89.ubuntu1404, Marathon 1.1.2
Reporter: David vonThenen
Priority: Minor
When using the HTTP API using protobufs, if the UUID in a TaskUpdate is malformed (in this case, was using a UUID that was base64 encoded), it would cause the Agent where the executor is running on to crash and restart.
Here is a JSON dump of the protobuf used:
{code}
{
"executor_id": {
"value": "executor-scaleio1"
},
"framework_id": {
"value": "ac8545a7-f8fc-431e-bc36-0239c4460658-0002"
},
"type": 2,
"update": {
"status": {
"task_id": {
"value": "scaleio1"
},
"state": 1,
"source": 2,
"executor_id": {
"value": "executor-scaleio1"
},
"uuid": "WVdVd01EQTFNakF0TkdVeU9TMDBNell3TFdJMk4yUXRPR05sT1RFNU56VmlPREUw"
}
}
}
{code}
In the master it looks like is processes the accept calls… but after it processes all of them, it looks like the agents are immediately being disconnected:
{code}
...
...
I0816 17:53:09.974340 4010 master.cpp:3342] Processing ACCEPT call for offers: [ 2bf179c3-004a-49e3-98ab-5a75fa773522-O80 ] on agent 2bf179c3-004a-49e3-98ab-5a75fa773522-S7 at slave(1)@172.31.22.211:5051 (ec2-52-89-227-184.us-west-2.compute.amazonaws.com) for framework 2bf179c3-004a-49e3-98ab-5a75fa773522-0001 (ScaleIO Framework)
W0816 17:53:09.974578 4010 validation.cpp:647] Executor executor-scaleio4 for task scaleio4 uses less CPUs (None) than the minimum required (0.01). Please update your executor, as this will be mandatory in future releases.
W0816 17:53:09.974604 4010 validation.cpp:659] Executor executor-scaleio4 for task scaleio4 uses less memory (None) than the minimum required (32MB). Please update your executor, as this will be mandatory in future releases.
I0816 17:53:09.974645 4010 master.cpp:7439] Adding task scaleio4 with resources cpus(*):1; mem(*):2048 on agent 2bf179c3-004a-49e3-98ab-5a75fa773522-S7 (ec2-52-89-227-184.us-west-2.compute.amazonaws.com)
I0816 17:53:09.974668 4010 master.cpp:3831] Launching task scaleio4 of framework 2bf179c3-004a-49e3-98ab-5a75fa773522-0001 (ScaleIO Framework) with resources cpus(*):1; mem(*):2048 on agent 2bf179c3-004a-49e3-98ab-5a75fa773522-S7 at slave(1)@172.31.22.211:5051 (ec2-52-89-227-184.us-west-2.compute.amazonaws.com)
I0816 17:53:11.306182 4010 master.cpp:1245] Agent 2bf179c3-004a-49e3-98ab-5a75fa773522-S7 at slave(1)@172.31.22.211:5051 (ec2-52-89-227-184.us-west-2.compute.amazonaws.com) disconnected
I0816 17:53:11.306335 4010 master.cpp:2784] Disconnecting agent 2bf179c3-004a-49e3-98ab-5a75fa773522-S7 at slave(1)@172.31.22.211:5051 (ec2-52-89-227-184.us-west-2.compute.amazonaws.com)
I0816 17:53:11.306520 4010 master.cpp:2803] Deactivating agent 2bf179c3-004a-49e3-98ab-5a75fa773522-S7 at slave(1)@172.31.22.211:5051 (ec2-52-89-227-184.us-west-2.compute.amazonaws.com)
I0816 17:53:11.306676 4010 master.cpp:1264] Removing framework 2bf179c3-004a-49e3-98ab-5a75fa773522-0001 (ScaleIO Framework) from disconnected agent 2bf179c3-004a-49e3-98ab-5a75fa773522-S7 at slave(1)@172.31.22.211:5051 (ec2-52-89-227-184.us-west-2.compute.amazonaws.com) because the framework is not checkpointing
I0816 17:53:11.306798 4010 master.cpp:6448] Removing framework 2bf179c3-004a-49e3-98ab-5a75fa773522-0001 (ScaleIO Framework) from agent 2bf179c3-004a-49e3-98ab-5a75fa773522-S7 at slave(1)@172.31.22.211:5051 (ec2-52-89-227-184.us-west-2.compute.amazonaws.com)
I0816 17:53:11.306882 4010 master.cpp:6833] Updating the state of task scaleio4 of framework 2bf179c3-004a-49e3-98ab-5a75fa773522-0001 (latest state: TASK_LOST, status update state: TASK_LOST)
I0816 17:53:11.306778 4013 hierarchical.cpp:571] Agent 2bf179c3-004a-49e3-98ab-5a75fa773522-S7 deactivated
I0816 17:53:11.307140 4010 master.cpp:6899] Removing task scaleio4 with resources cpus(*):1; mem(*):2048 of framework 2bf179c3-004a-49e3-98ab-5a75fa773522-0001 on agent 2bf179c3-004a-49e3-98ab-5a75fa773522-S7 at slave(1)@172.31.22.211:5051 (ec2-52-89-227-184.us-west-2.compute.amazonaws.com)
I0816 17:53:11.307312 4010 master.cpp:5190] Sending status update TASK_LOST for task scaleio4 of framework 2bf179c3-004a-49e3-98ab-5a75fa773522-0001 'Slave ec2-52-89-227-184.us-west-2.compute.amazonaws.com disconnected'
I0816 17:53:11.307533 4010 master.cpp:6928] Removing executor 'executor-scaleio4' with resources of framework 2bf179c3-004a-49e3-98ab-5a75fa773522-0001 on agent 2bf179c3-004a-49e3-98ab-5a75fa773522-S7 at slave(1)@172.31.22.211:5051 (ec2-52-89-227-184.us-west-2.compute.amazonaws.com)
I0816 17:53:11.472939 4017 master.cpp:1245] Agent 2bf179c3-004a-49e3-98ab-5a75fa773522-S4 at slave(1)@172.31.17.252:5051 (ec2-52-88-195-213.us-west-2.compute.amazonaws.com) disconnected
...
...
{code}
The agent receives the POST from the executor:
{code}
...
...
I0816 17:51:09.001888 1237 slave.cpp:4591] Current disk usage 31.86%. Max allowed age: 4.069593432939398days
I0816 17:52:09.002300 1236 slave.cpp:4591] Current disk usage 31.86%. Max allowed age: 4.069545128332523days
I0816 17:53:09.002799 1234 slave.cpp:4591] Current disk usage 31.86%. Max allowed age: 4.069496823725636days
I0816 17:53:10.033020 1240 slave.cpp:1495] Got assigned task scaleio3 for framework 2bf179c3-004a-49e3-98ab-5a75fa773522-0001
I0816 17:53:10.033210 1240 slave.cpp:1614] Launching task scaleio3 for framework 2bf179c3-004a-49e3-98ab-5a75fa773522-0001
I0816 17:53:10.033980 1240 paths.cpp:528] Trying to chown '/tmp/mesos/slaves/2bf179c3-004a-49e3-98ab-5a75fa773522-S5/frameworks/2bf179c3-004a-49e3-98ab-5a75fa773522-0001/executors/executor-scaleio3/runs/9aa4ee18-350b-4a65-a36b-eef9449f5d11' to user 'root'
I0816 17:53:10.036744 1240 slave.cpp:5674] Launching executor executor-scaleio3 of framework 2bf179c3-004a-49e3-98ab-5a75fa773522-0001 with resources in work directory '/tmp/mesos/slaves/2bf179c3-004a-49e3-98ab-5a75fa773522-S5/frameworks/2bf179c3-004a-49e3-98ab-5a75fa773522-0001/executors/executor-scaleio3/runs/9aa4ee18-350b-4a65-a36b-eef9449f5d11'
I0816 17:53:10.036864 1240 slave.cpp:1840] Queuing task 'scaleio3' for executor 'executor-scaleio3' of framework 2bf179c3-004a-49e3-98ab-5a75fa773522-0001
I0816 17:53:10.036898 1237 containerizer.cpp:781] Starting container '9aa4ee18-350b-4a65-a36b-eef9449f5d11' for executor 'executor-scaleio3' of framework '2bf179c3-004a-49e3-98ab-5a75fa773522-0001'
I0816 17:53:10.037387 1240 linux_launcher.cpp:281] Cloning child process with flags =
I0816 17:53:10.457927 1234 http.cpp:270] HTTP POST for /slave(1)/api/v1/executor from 172.31.23.107:49326 with User-Agent='scaleio/0.1'
I0816 17:53:10.458055 1234 slave.cpp:2661] Received Subscribe request for HTTP executor 'executor-scaleio3' of framework 2bf179c3-004a-49e3-98ab-5a75fa773522-0001
I0816 17:53:10.462604 1234 slave.cpp:2005] Sending queued task 'scaleio3' to executor 'executor-scaleio3' of framework 2bf179c3-004a-49e3-98ab-5a75fa773522-0001 (via HTTP)
I0816 17:53:11.464956 1233 http.cpp:270] HTTP POST for /slave(1)/api/v1/executor from 172.31.23.107:49328 with User-Agent='scaleio/0.1'
{code}
Then crashes out and the agent restarts with a new agent log:
{code}
Log file created at: 2016/08/16 17:53:11
Running on machine: ec2-52-38-65-6.us-west-2.compute.amazonaws.com
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I0816 17:53:11.674993 4977 logging.cpp:194] INFO level logging started!
I0816 17:53:11.678026 4977 containerizer.cpp:196] Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni
I0816 17:53:11.681545 4977 linux_launcher.cpp:101] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
I0816 17:53:11.682831 4977 main.cpp:434] Starting Mesos agent
...
...
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)