You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Jie Yu <yu...@gmail.com> on 2016/12/08 01:46:32 UTC

Review Request 54518: Refactored the reaping logic in I/O switchboard.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54518/
-----------------------------------------------------------

Review request for mesos and Kevin Klues.


Bugs: MESOS-6756
    https://issues.apache.org/jira/browse/MESOS-6756


Repository: mesos


Description
-------

Previously, we don't handle the case where reaping failed. This patch
refactored the reaping logic. The main idea is to register a 'reaped'
callback which gets invoked when the reaping of the I/O switchboard
server process has a result. This also changes the 'watch' logic.


Diffs
-----

  src/slave/containerizer/mesos/io/switchboard.hpp a8a1e2558432ae1e4c7e5513e5653feba4352056 
  src/slave/containerizer/mesos/io/switchboard.cpp af956951de9c605c68f6ec16a7edf3c33f4bd499 

Diff: https://reviews.apache.org/r/54518/diff/


Testing
-------

make check


Thanks,

Jie Yu


Re: Review Request 54518: Refactored the reaping logic in I/O switchboard.

Posted by Jie Yu <yu...@gmail.com>.

> On Dec. 8, 2016, 2:15 a.m., Kevin Klues wrote:
> > src/slave/containerizer/mesos/io/switchboard.cpp, line 815
> > <https://reviews.apache.org/r/54518/diff/1/?file=1579626#file1579626line815>
> >
> >     Why do you have to do things this way? With the `await()` and the `list` of futures.
> >     
> >     Can't we just have an `.onAny()` on status?

Unfortunately, 'status' is a `Future<Option<int>>` and the return value of this method is `Future<Nothing>`. I cannot use `.then` to chain becaues I care about the failure/discarded case as well.


> On Dec. 8, 2016, 2:15 a.m., Kevin Klues wrote:
> > src/slave/containerizer/mesos/io/switchboard.cpp, lines 849-853
> > <https://reviews.apache.org/r/54518/diff/1/?file=1579626#file1579626line849>
> >
> >     When does this case happen?

is waitpid returns a failure, we might get this, which i believe should not happen. Just a bit of defensive.


- Jie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54518/#review158479
-----------------------------------------------------------


On Dec. 8, 2016, 1:46 a.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54518/
> -----------------------------------------------------------
> 
> (Updated Dec. 8, 2016, 1:46 a.m.)
> 
> 
> Review request for mesos and Kevin Klues.
> 
> 
> Bugs: MESOS-6756
>     https://issues.apache.org/jira/browse/MESOS-6756
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Previously, we don't handle the case where reaping failed. This patch
> refactored the reaping logic. The main idea is to register a 'reaped'
> callback which gets invoked when the reaping of the I/O switchboard
> server process has a result. This also changes the 'watch' logic.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/io/switchboard.hpp a8a1e2558432ae1e4c7e5513e5653feba4352056 
>   src/slave/containerizer/mesos/io/switchboard.cpp af956951de9c605c68f6ec16a7edf3c33f4bd499 
> 
> Diff: https://reviews.apache.org/r/54518/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>


Re: Review Request 54518: Refactored the reaping logic in I/O switchboard.

Posted by Jie Yu <yu...@gmail.com>.

> On Dec. 8, 2016, 2:15 a.m., Kevin Klues wrote:
> > src/slave/containerizer/mesos/io/switchboard.cpp, line 865
> > <https://reviews.apache.org/r/54518/diff/1/?file=1579626#file1579626line865>
> >
> >     Do you mean:
> >     ```
> >     if (WIFEXITED(status.get()) && WEXITSTATUS(status.get()) == 0)
> >     ```

yeah, they should be the same. But I'll use your way as it is more explicit.


- Jie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54518/#review158479
-----------------------------------------------------------


On Dec. 8, 2016, 1:46 a.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54518/
> -----------------------------------------------------------
> 
> (Updated Dec. 8, 2016, 1:46 a.m.)
> 
> 
> Review request for mesos and Kevin Klues.
> 
> 
> Bugs: MESOS-6756
>     https://issues.apache.org/jira/browse/MESOS-6756
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Previously, we don't handle the case where reaping failed. This patch
> refactored the reaping logic. The main idea is to register a 'reaped'
> callback which gets invoked when the reaping of the I/O switchboard
> server process has a result. This also changes the 'watch' logic.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/io/switchboard.hpp a8a1e2558432ae1e4c7e5513e5653feba4352056 
>   src/slave/containerizer/mesos/io/switchboard.cpp af956951de9c605c68f6ec16a7edf3c33f4bd499 
> 
> Diff: https://reviews.apache.org/r/54518/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>


Re: Review Request 54518: Refactored the reaping logic in I/O switchboard.

Posted by Jie Yu <yu...@gmail.com>.

> On Dec. 8, 2016, 2:15 a.m., Kevin Klues wrote:
> > So it looks like the basic idea in this patch is to:
> > 
> > 1) Add a `reaped()` function which we call whenever a process is reaped. From there we log why the process was reaped and (possibly) set a limitation on it if it died abnormally. We store this limitation in the switchboard's info struct.
> > 2) Simplify the logic in `watch()` down to do nothing more than return the limitation future from the switchboard's info struct.
> > 3) Change `cleanup()` to only send SIGTERM to the switchboard process if its status is still pending (i.e. it hasn't been reaped yet).
> > 4) Change cleanup to properly remove the unix domain socket file in all cases of the `reaped()` future being satisifed.
> > 
> > If so, it sounds/looks good to me overall . Just a few comments.

Nice summary. I'll use that as part of the commit message!


- Jie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54518/#review158479
-----------------------------------------------------------


On Dec. 8, 2016, 1:46 a.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54518/
> -----------------------------------------------------------
> 
> (Updated Dec. 8, 2016, 1:46 a.m.)
> 
> 
> Review request for mesos and Kevin Klues.
> 
> 
> Bugs: MESOS-6756
>     https://issues.apache.org/jira/browse/MESOS-6756
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Previously, we don't handle the case where reaping failed. This patch
> refactored the reaping logic. The main idea is to register a 'reaped'
> callback which gets invoked when the reaping of the I/O switchboard
> server process has a result. This also changes the 'watch' logic.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/io/switchboard.hpp a8a1e2558432ae1e4c7e5513e5653feba4352056 
>   src/slave/containerizer/mesos/io/switchboard.cpp af956951de9c605c68f6ec16a7edf3c33f4bd499 
> 
> Diff: https://reviews.apache.org/r/54518/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>


Re: Review Request 54518: Refactored the reaping logic in I/O switchboard.

Posted by Kevin Klues <kl...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54518/#review158479
-----------------------------------------------------------



So it looks like the basic idea in this patch is to:

1) Add a `reaped()` function which we call whenever a process is reaped. From there we log why the process was reaped and (possibly) set a limitation on it if it died abnormally. We store this limitation in the switchboard's info struct.
2) Simplify the logic in `watch()` down to do nothing more than return the limitation future from the switchboard's info struct.
3) Change `cleanup()` to only send SIGTERM to the switchboard process if its status is still pending (i.e. it hasn't been reaped yet).
4) Change cleanup to properly remove the unix domain socket file in all cases of the `reaped()` future being satisifed.

If so, it sounds/looks good to me overall . Just a few comments.


src/slave/containerizer/mesos/io/switchboard.cpp (line 779)
<https://reviews.apache.org/r/54518/#comment229244>

    Why do you have to do things this way? With the `await()` and the `list` of futures.
    
    Can't we just have an `.onAny()` on status?



src/slave/containerizer/mesos/io/switchboard.cpp (lines 812 - 816)
<https://reviews.apache.org/r/54518/#comment229245>

    When does this case happen?



src/slave/containerizer/mesos/io/switchboard.cpp (line 828)
<https://reviews.apache.org/r/54518/#comment229248>

    Do you mean:
    ```
    if (WIFEXITED(status.get()) && WEXITSTATUS(status.get()) == 0)
    ```



src/slave/containerizer/mesos/io/switchboard.cpp (line 834)
<https://reviews.apache.org/r/54518/#comment229249>

    s/is/if/


- Kevin Klues


On Dec. 8, 2016, 1:46 a.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54518/
> -----------------------------------------------------------
> 
> (Updated Dec. 8, 2016, 1:46 a.m.)
> 
> 
> Review request for mesos and Kevin Klues.
> 
> 
> Bugs: MESOS-6756
>     https://issues.apache.org/jira/browse/MESOS-6756
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Previously, we don't handle the case where reaping failed. This patch
> refactored the reaping logic. The main idea is to register a 'reaped'
> callback which gets invoked when the reaping of the I/O switchboard
> server process has a result. This also changes the 'watch' logic.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/io/switchboard.hpp a8a1e2558432ae1e4c7e5513e5653feba4352056 
>   src/slave/containerizer/mesos/io/switchboard.cpp af956951de9c605c68f6ec16a7edf3c33f4bd499 
> 
> Diff: https://reviews.apache.org/r/54518/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>


Re: Review Request 54518: Refactored the reaping logic in I/O switchboard.

Posted by Kevin Klues <kl...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54518/#review158489
-----------------------------------------------------------


Ship it!




Ship It!

- Kevin Klues


On Dec. 8, 2016, 1:46 a.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54518/
> -----------------------------------------------------------
> 
> (Updated Dec. 8, 2016, 1:46 a.m.)
> 
> 
> Review request for mesos and Kevin Klues.
> 
> 
> Bugs: MESOS-6756
>     https://issues.apache.org/jira/browse/MESOS-6756
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Previously, we don't handle the case where reaping failed. This patch
> refactored the reaping logic. The main idea is to register a 'reaped'
> callback which gets invoked when the reaping of the I/O switchboard
> server process has a result. This also changes the 'watch' logic.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/io/switchboard.hpp a8a1e2558432ae1e4c7e5513e5653feba4352056 
>   src/slave/containerizer/mesos/io/switchboard.cpp af956951de9c605c68f6ec16a7edf3c33f4bd499 
> 
> Diff: https://reviews.apache.org/r/54518/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>