You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Klaus Ma <kl...@cguru.net> on 2015/09/05 05:27:30 UTC
Re: Review Request 37531: MESOS-3070 (Master CHECK failure if a
framework uses duplicated task id)
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37531/
-----------------------------------------------------------
(Updated Sept. 5, 2015, 3:27 a.m.)
Review request for mesos and Vinod Kone.
Changes
-------
Add summary & description
Summary (updated)
-----------------
MESOS-3070 (Master CHECK failure if a framework uses duplicated task id)
Bugs: MESOS-3070
https://issues.apache.org/jira/browse/MESOS-3070
Repository: mesos
Description (updated)
-------
__Phenomenon:__
The master crash because of duplicated task id
__Root Cause:__
The task id are stored in slave agent; if master failover, there's a time window that new slave lanched a task with same task id; so if the old task re-registered back, the master will crash because of duplicated task id.
__Solution:__
Stores tasks info in Master::Framework by SlaveID to avoid duplicated issue.
Diffs
-----
src/master/http.cpp 37d76ee
src/master/master.hpp 36c6759
src/master/master.cpp 95207d2
src/tests/master_tests.cpp 8a6b98b
Diff: https://reviews.apache.org/r/37531/diff/
Testing
-------
make
make check
Thanks,
Klaus Ma
Re: Review Request 37531: Fix master CHECK failure if a framework uses
duplicated task id.
Posted by Mesos ReviewBot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37531/#review114243
-----------------------------------------------------------
Patch looks great!
Reviews applied: [37531]
Passed command: export OS=ubuntu:14.04;export CONFIGURATION="--verbose";export COMPILER=gcc; ./support/docker_build.sh
- Mesos ReviewBot
On Jan. 13, 2016, 2:06 p.m., Klaus Ma wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/37531/
> -----------------------------------------------------------
>
> (Updated Jan. 13, 2016, 2:06 p.m.)
>
>
> Review request for mesos, Jie Yu and Vinod Kone.
>
>
> Bugs: MESOS-3070
> https://issues.apache.org/jira/browse/MESOS-3070
>
>
> Repository: mesos
>
>
> Description
> -------
>
> __Phenomenon:__
> The master crash because of duplicated task id
>
> __Root Cause:__
> The task id are stored in slave agent; if master failover, there's a time window that new slave lanched a task with same task id; so if the old task re-registered back, the master will crash because of duplicated task id.
>
> __Solution:__
> Stores tasks info in Master::Framework by SlaveID to avoid duplicated issue.
>
>
> Diffs
> -----
>
> src/master/http.cpp bcafc7aff89659a68352f3876ce6042f8b34bd5d
> src/master/master.hpp f02d165874fa8023675e545793de699aeecae29b
> src/master/master.cpp c122c30d943813fc3ce9e7025783c7231809b022
> src/tests/master_tests.cpp 223b9d20a3a8a8194a3a6a605ec2394c37ab5957
>
> Diff: https://reviews.apache.org/r/37531/diff/
>
>
> Testing
> -------
>
> make
> make check
>
>
> Thanks,
>
> Klaus Ma
>
>
Re: Review Request 37531: Fix master CHECK failure if a framework uses
duplicated task id.
Posted by Klaus Ma <kl...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37531/#review120073
-----------------------------------------------------------
ping @jieyu/vinodkone.
- Klaus Ma
On Jan. 13, 2016, 10:06 p.m., Klaus Ma wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/37531/
> -----------------------------------------------------------
>
> (Updated Jan. 13, 2016, 10:06 p.m.)
>
>
> Review request for mesos, Jie Yu and Vinod Kone.
>
>
> Bugs: MESOS-3070
> https://issues.apache.org/jira/browse/MESOS-3070
>
>
> Repository: mesos
>
>
> Description
> -------
>
> __Phenomenon:__
> The master crash because of duplicated task id
>
> __Root Cause:__
> The task id are stored in slave agent; if master failover, there's a time window that new slave lanched a task with same task id; so if the old task re-registered back, the master will crash because of duplicated task id.
>
> __Solution:__
> Stores tasks info in Master::Framework by SlaveID to avoid duplicated issue.
>
>
> Diffs
> -----
>
> src/master/http.cpp bcafc7aff89659a68352f3876ce6042f8b34bd5d
> src/master/master.hpp f02d165874fa8023675e545793de699aeecae29b
> src/master/master.cpp c122c30d943813fc3ce9e7025783c7231809b022
> src/tests/master_tests.cpp 223b9d20a3a8a8194a3a6a605ec2394c37ab5957
>
> Diff: https://reviews.apache.org/r/37531/diff/
>
>
> Testing
> -------
>
> make
> make check
>
>
> Thanks,
>
> Klaus Ma
>
>
Re: Review Request 37531: Fix master CHECK failure if a framework uses
duplicated task id.
Posted by Klaus Ma <kl...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37531/
-----------------------------------------------------------
(Updated Jan. 13, 2016, 10:06 p.m.)
Review request for mesos, Jie Yu and Vinod Kone.
Changes
-------
rebase and ping Vinod :).
Summary (updated)
-----------------
Fix master CHECK failure if a framework uses duplicated task id.
Bugs: MESOS-3070
https://issues.apache.org/jira/browse/MESOS-3070
Repository: mesos
Description
-------
__Phenomenon:__
The master crash because of duplicated task id
__Root Cause:__
The task id are stored in slave agent; if master failover, there's a time window that new slave lanched a task with same task id; so if the old task re-registered back, the master will crash because of duplicated task id.
__Solution:__
Stores tasks info in Master::Framework by SlaveID to avoid duplicated issue.
Diffs (updated)
-----
src/master/http.cpp bcafc7aff89659a68352f3876ce6042f8b34bd5d
src/master/master.hpp f02d165874fa8023675e545793de699aeecae29b
src/master/master.cpp c122c30d943813fc3ce9e7025783c7231809b022
src/tests/master_tests.cpp 223b9d20a3a8a8194a3a6a605ec2394c37ab5957
Diff: https://reviews.apache.org/r/37531/diff/
Testing
-------
make
make check
Thanks,
Klaus Ma
Re: Review Request 37531: MESOS-3070 (Master CHECK failure if a
framework uses duplicated task id)
Posted by Mesos ReviewBot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37531/#review100737
-----------------------------------------------------------
Patch looks great!
Reviews applied: [37531]
All tests passed.
- Mesos ReviewBot
On Sept. 26, 2015, 2:52 a.m., Klaus Ma wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/37531/
> -----------------------------------------------------------
>
> (Updated Sept. 26, 2015, 2:52 a.m.)
>
>
> Review request for mesos, Jie Yu and Vinod Kone.
>
>
> Bugs: MESOS-3070
> https://issues.apache.org/jira/browse/MESOS-3070
>
>
> Repository: mesos
>
>
> Description
> -------
>
> __Phenomenon:__
> The master crash because of duplicated task id
>
> __Root Cause:__
> The task id are stored in slave agent; if master failover, there's a time window that new slave lanched a task with same task id; so if the old task re-registered back, the master will crash because of duplicated task id.
>
> __Solution:__
> Stores tasks info in Master::Framework by SlaveID to avoid duplicated issue.
>
>
> Diffs
> -----
>
> src/master/http.cpp cd37c91
> src/master/master.hpp 4bb65f0
> src/master/master.cpp 6bee4f3
> src/tests/master_tests.cpp ee24739
>
> Diff: https://reviews.apache.org/r/37531/diff/
>
>
> Testing
> -------
>
> make
> make check
>
>
> Thanks,
>
> Klaus Ma
>
>
Re: Review Request 37531: MESOS-3070 (Master CHECK failure if a
framework uses duplicated task id)
Posted by Klaus Ma <kl...@cguru.net>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37531/
-----------------------------------------------------------
(Updated Sept. 26, 2015, 2:52 a.m.)
Review request for mesos, Jie Yu and Vinod Kone.
Changes
-------
Merge the code with the latest code; and re-check whether any potentail issue. I'll add more UT case on "kill duplicated tasks" and "show duplicated tasks in metrics"
Bugs: MESOS-3070
https://issues.apache.org/jira/browse/MESOS-3070
Repository: mesos
Description
-------
__Phenomenon:__
The master crash because of duplicated task id
__Root Cause:__
The task id are stored in slave agent; if master failover, there's a time window that new slave lanched a task with same task id; so if the old task re-registered back, the master will crash because of duplicated task id.
__Solution:__
Stores tasks info in Master::Framework by SlaveID to avoid duplicated issue.
Diffs (updated)
-----
src/master/http.cpp cd37c91
src/master/master.hpp 4bb65f0
src/master/master.cpp 6bee4f3
src/tests/master_tests.cpp ee24739
Diff: https://reviews.apache.org/r/37531/diff/
Testing
-------
make
make check
Thanks,
Klaus Ma