You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Klaus Ma <kl...@cguru.net> on 2015/09/05 05:27:30 UTC

Re: Review Request 37531: MESOS-3070 (Master CHECK failure if a framework uses duplicated task id)

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37531/
-----------------------------------------------------------

(Updated Sept. 5, 2015, 3:27 a.m.)


Review request for mesos and Vinod Kone.


Changes
-------

Add summary & description


Summary (updated)
-----------------

MESOS-3070 (Master CHECK failure if a framework uses duplicated task id)


Bugs: MESOS-3070
    https://issues.apache.org/jira/browse/MESOS-3070


Repository: mesos


Description (updated)
-------

__Phenomenon:__
The master crash because of duplicated task id

__Root Cause:__
The task id are stored in slave agent; if master failover, there's a time window that new slave lanched a task with same task id; so if the old task re-registered back, the master will crash because of duplicated task id.

__Solution:__
Stores tasks info in Master::Framework by SlaveID to avoid duplicated issue.


Diffs
-----

  src/master/http.cpp 37d76ee 
  src/master/master.hpp 36c6759 
  src/master/master.cpp 95207d2 
  src/tests/master_tests.cpp 8a6b98b 

Diff: https://reviews.apache.org/r/37531/diff/


Testing
-------

make
make check


Thanks,

Klaus Ma


Re: Review Request 37531: Fix master CHECK failure if a framework uses duplicated task id.

Posted by Mesos ReviewBot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37531/#review114243
-----------------------------------------------------------


Patch looks great!

Reviews applied: [37531]

Passed command: export OS=ubuntu:14.04;export CONFIGURATION="--verbose";export COMPILER=gcc; ./support/docker_build.sh

- Mesos ReviewBot


On Jan. 13, 2016, 2:06 p.m., Klaus Ma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/37531/
> -----------------------------------------------------------
> 
> (Updated Jan. 13, 2016, 2:06 p.m.)
> 
> 
> Review request for mesos, Jie Yu and Vinod Kone.
> 
> 
> Bugs: MESOS-3070
>     https://issues.apache.org/jira/browse/MESOS-3070
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> __Phenomenon:__
> The master crash because of duplicated task id
> 
> __Root Cause:__
> The task id are stored in slave agent; if master failover, there's a time window that new slave lanched a task with same task id; so if the old task re-registered back, the master will crash because of duplicated task id.
> 
> __Solution:__
> Stores tasks info in Master::Framework by SlaveID to avoid duplicated issue.
> 
> 
> Diffs
> -----
> 
>   src/master/http.cpp bcafc7aff89659a68352f3876ce6042f8b34bd5d 
>   src/master/master.hpp f02d165874fa8023675e545793de699aeecae29b 
>   src/master/master.cpp c122c30d943813fc3ce9e7025783c7231809b022 
>   src/tests/master_tests.cpp 223b9d20a3a8a8194a3a6a605ec2394c37ab5957 
> 
> Diff: https://reviews.apache.org/r/37531/diff/
> 
> 
> Testing
> -------
> 
> make
> make check
> 
> 
> Thanks,
> 
> Klaus Ma
> 
>


Re: Review Request 37531: Fix master CHECK failure if a framework uses duplicated task id.

Posted by Klaus Ma <kl...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37531/#review120073
-----------------------------------------------------------



ping @jieyu/vinodkone.

- Klaus Ma


On Jan. 13, 2016, 10:06 p.m., Klaus Ma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/37531/
> -----------------------------------------------------------
> 
> (Updated Jan. 13, 2016, 10:06 p.m.)
> 
> 
> Review request for mesos, Jie Yu and Vinod Kone.
> 
> 
> Bugs: MESOS-3070
>     https://issues.apache.org/jira/browse/MESOS-3070
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> __Phenomenon:__
> The master crash because of duplicated task id
> 
> __Root Cause:__
> The task id are stored in slave agent; if master failover, there's a time window that new slave lanched a task with same task id; so if the old task re-registered back, the master will crash because of duplicated task id.
> 
> __Solution:__
> Stores tasks info in Master::Framework by SlaveID to avoid duplicated issue.
> 
> 
> Diffs
> -----
> 
>   src/master/http.cpp bcafc7aff89659a68352f3876ce6042f8b34bd5d 
>   src/master/master.hpp f02d165874fa8023675e545793de699aeecae29b 
>   src/master/master.cpp c122c30d943813fc3ce9e7025783c7231809b022 
>   src/tests/master_tests.cpp 223b9d20a3a8a8194a3a6a605ec2394c37ab5957 
> 
> Diff: https://reviews.apache.org/r/37531/diff/
> 
> 
> Testing
> -------
> 
> make
> make check
> 
> 
> Thanks,
> 
> Klaus Ma
> 
>


Re: Review Request 37531: Fix master CHECK failure if a framework uses duplicated task id.

Posted by Klaus Ma <kl...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37531/
-----------------------------------------------------------

(Updated Jan. 13, 2016, 10:06 p.m.)


Review request for mesos, Jie Yu and Vinod Kone.


Changes
-------

rebase and ping Vinod :).


Summary (updated)
-----------------

Fix master CHECK failure if a framework uses duplicated task id.


Bugs: MESOS-3070
    https://issues.apache.org/jira/browse/MESOS-3070


Repository: mesos


Description
-------

__Phenomenon:__
The master crash because of duplicated task id

__Root Cause:__
The task id are stored in slave agent; if master failover, there's a time window that new slave lanched a task with same task id; so if the old task re-registered back, the master will crash because of duplicated task id.

__Solution:__
Stores tasks info in Master::Framework by SlaveID to avoid duplicated issue.


Diffs (updated)
-----

  src/master/http.cpp bcafc7aff89659a68352f3876ce6042f8b34bd5d 
  src/master/master.hpp f02d165874fa8023675e545793de699aeecae29b 
  src/master/master.cpp c122c30d943813fc3ce9e7025783c7231809b022 
  src/tests/master_tests.cpp 223b9d20a3a8a8194a3a6a605ec2394c37ab5957 

Diff: https://reviews.apache.org/r/37531/diff/


Testing
-------

make
make check


Thanks,

Klaus Ma


Re: Review Request 37531: MESOS-3070 (Master CHECK failure if a framework uses duplicated task id)

Posted by Mesos ReviewBot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37531/#review100737
-----------------------------------------------------------


Patch looks great!

Reviews applied: [37531]

All tests passed.

- Mesos ReviewBot


On Sept. 26, 2015, 2:52 a.m., Klaus Ma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/37531/
> -----------------------------------------------------------
> 
> (Updated Sept. 26, 2015, 2:52 a.m.)
> 
> 
> Review request for mesos, Jie Yu and Vinod Kone.
> 
> 
> Bugs: MESOS-3070
>     https://issues.apache.org/jira/browse/MESOS-3070
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> __Phenomenon:__
> The master crash because of duplicated task id
> 
> __Root Cause:__
> The task id are stored in slave agent; if master failover, there's a time window that new slave lanched a task with same task id; so if the old task re-registered back, the master will crash because of duplicated task id.
> 
> __Solution:__
> Stores tasks info in Master::Framework by SlaveID to avoid duplicated issue.
> 
> 
> Diffs
> -----
> 
>   src/master/http.cpp cd37c91 
>   src/master/master.hpp 4bb65f0 
>   src/master/master.cpp 6bee4f3 
>   src/tests/master_tests.cpp ee24739 
> 
> Diff: https://reviews.apache.org/r/37531/diff/
> 
> 
> Testing
> -------
> 
> make
> make check
> 
> 
> Thanks,
> 
> Klaus Ma
> 
>


Re: Review Request 37531: MESOS-3070 (Master CHECK failure if a framework uses duplicated task id)

Posted by Klaus Ma <kl...@cguru.net>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37531/
-----------------------------------------------------------

(Updated Sept. 26, 2015, 2:52 a.m.)


Review request for mesos, Jie Yu and Vinod Kone.


Changes
-------

Merge the code with the latest code; and re-check whether any potentail issue. I'll add more UT case on "kill duplicated tasks" and "show duplicated tasks in metrics"


Bugs: MESOS-3070
    https://issues.apache.org/jira/browse/MESOS-3070


Repository: mesos


Description
-------

__Phenomenon:__
The master crash because of duplicated task id

__Root Cause:__
The task id are stored in slave agent; if master failover, there's a time window that new slave lanched a task with same task id; so if the old task re-registered back, the master will crash because of duplicated task id.

__Solution:__
Stores tasks info in Master::Framework by SlaveID to avoid duplicated issue.


Diffs (updated)
-----

  src/master/http.cpp cd37c91 
  src/master/master.hpp 4bb65f0 
  src/master/master.cpp 6bee4f3 
  src/tests/master_tests.cpp ee24739 

Diff: https://reviews.apache.org/r/37531/diff/


Testing
-------

make
make check


Thanks,

Klaus Ma