You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Anand Mazumdar <ma...@gmail.com> on 2015/11/24 19:25:43 UTC
Review Request 40660: Linked against executor PID's to ensure ordered
message delivery
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40660/
-----------------------------------------------------------
Review request for mesos and Vinod Kone.
Bugs: MESOS-3851
https://issues.apache.org/jira/browse/MESOS-3851
Repository: mesos
Description
-------
Previously, we did not `link` against the executor `PID` while (re)-registering. This might lead to libprocess creating ephemeral sockets everytime a `send(...)` was invoked. This was leading to races where messages might appear on the Executor out of order. This change does a `link(...)` on the executor PID to ensure ordered message delivery.
---Not to be included in commit message---
I am still not comfortable bringing back the reverted commit https://reviews.apache.org/r/40107/ . I can see one more race condition even with a `link(...)`. We can still have messages coming out of order when the first socket fails after sending the first message when still in flight. A new socket gets created when we send the second message now, which might arrive earlier then the first message leading to a race. But, this is a behavior that is heavily relied upon elsewhere in our code-base. Happy to be proven wrong though and be convinced that we can bring back the reverted commit now after this change.
Diffs
-----
src/slave/slave.cpp 9055f2a789cb19f3579c15a379ea505dfef0578c
Diff: https://reviews.apache.org/r/40660/diff/
Testing
-------
make check
Thanks,
Anand Mazumdar
Re: Review Request 40660: Linked against executor PID's to ensure
ordered message delivery
Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40660/#review109044
-----------------------------------------------------------
Ship it!
We may also want to link in the recovery path, but the agent <-> executor protocol is such that we don't need to in order to fix the issue.
- Ben Mahler
On Nov. 24, 2015, 6:25 p.m., Anand Mazumdar wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40660/
> -----------------------------------------------------------
>
> (Updated Nov. 24, 2015, 6:25 p.m.)
>
>
> Review request for mesos and Vinod Kone.
>
>
> Bugs: MESOS-3851
> https://issues.apache.org/jira/browse/MESOS-3851
>
>
> Repository: mesos
>
>
> Description
> -------
>
> Previously, we did not `link` against the executor `PID` while (re)-registering. This might lead to libprocess creating ephemeral sockets everytime a `send(...)` was invoked. This was leading to races where messages might appear on the Executor out of order. This change does a `link(...)` on the executor PID to ensure ordered message delivery.
>
> ---Not to be included in commit message---
> I am still not comfortable bringing back the reverted commit https://reviews.apache.org/r/40107/ . I can see one more race condition even with a `link(...)`. We can still have messages coming out of order when the first socket fails after sending the first message when still in flight. A new socket gets created when we send the second message now, which might arrive earlier then the first message leading to a race. But, this is a behavior that is heavily relied upon elsewhere in our code-base. Happy to be proven wrong though and be convinced that we can bring back the reverted commit now after this change.
>
>
> Diffs
> -----
>
> src/slave/slave.cpp 9055f2a789cb19f3579c15a379ea505dfef0578c
>
> Diff: https://reviews.apache.org/r/40660/diff/
>
>
> Testing
> -------
>
> make check
>
>
> Thanks,
>
> Anand Mazumdar
>
>
Re: Review Request 40660: Linked against executor PID's to ensure
ordered message delivery
Posted by Mesos ReviewBot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40660/#review107926
-----------------------------------------------------------
Patch looks great!
Reviews applied: [40660]
Passed command: export OS=ubuntu:14.04;export CONFIGURATION="--verbose";export COMPILER=gcc; ./support/docker_build.sh
- Mesos ReviewBot
On Nov. 24, 2015, 6:25 p.m., Anand Mazumdar wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40660/
> -----------------------------------------------------------
>
> (Updated Nov. 24, 2015, 6:25 p.m.)
>
>
> Review request for mesos and Vinod Kone.
>
>
> Bugs: MESOS-3851
> https://issues.apache.org/jira/browse/MESOS-3851
>
>
> Repository: mesos
>
>
> Description
> -------
>
> Previously, we did not `link` against the executor `PID` while (re)-registering. This might lead to libprocess creating ephemeral sockets everytime a `send(...)` was invoked. This was leading to races where messages might appear on the Executor out of order. This change does a `link(...)` on the executor PID to ensure ordered message delivery.
>
> ---Not to be included in commit message---
> I am still not comfortable bringing back the reverted commit https://reviews.apache.org/r/40107/ . I can see one more race condition even with a `link(...)`. We can still have messages coming out of order when the first socket fails after sending the first message when still in flight. A new socket gets created when we send the second message now, which might arrive earlier then the first message leading to a race. But, this is a behavior that is heavily relied upon elsewhere in our code-base. Happy to be proven wrong though and be convinced that we can bring back the reverted commit now after this change.
>
>
> Diffs
> -----
>
> src/slave/slave.cpp 9055f2a789cb19f3579c15a379ea505dfef0578c
>
> Diff: https://reviews.apache.org/r/40660/diff/
>
>
> Testing
> -------
>
> make check
>
>
> Thanks,
>
> Anand Mazumdar
>
>