You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Andrew Schwartzmeyer <an...@schwartzmeyer.com> on 2017/02/07 02:31:09 UTC

Review Request 56364: Windows: Stout: Rewrite Job Object wrappers.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56364/
-----------------------------------------------------------

Review request for mesos, Alex Clemmer and Joseph Wu.


Bugs: MESOS-6892
    https://issues.apache.org/jira/browse/MESOS-6892


Repository: mesos


Description
-------

`os::create_job` now returns a `Try<SharedHandle>` instead of a raw
`HANDLE`, forcing ownership of the job object handle onto the caller
of the function. `create_job` requires a `std::string name` for the
job object, which is mapped from a PID using `os::name_job`.

The assignment of a process to the job object is now done via
`Try<Nothing> os::assign_job(SharedHandle, pid_t)`.

The equivalent of killing a process tree with job object semantics
is simply to terminate the job object. This is done via
`os::kill_job(SharedHandle)`.


Diffs
-----

  3rdparty/stout/include/stout/windows/os.hpp b5172fca96c4151f4b1ebb6d343022558f45fc34 

Diff: https://reviews.apache.org/r/56364/diff/


Testing
-------


Thanks,

Andrew Schwartzmeyer


Re: Review Request 56364: Windows: Stout: Rewrite job object wrappers.

Posted by Andrew Schwartzmeyer <an...@schwartzmeyer.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56364/
-----------------------------------------------------------

(Updated March 27, 2017, 10:10 p.m.)


Review request for mesos, Alex Clemmer and Joseph Wu.


Changes
-------

Updated for `JobObjectManager` changes.


Summary (updated)
-----------------

Windows: Stout: Rewrite job object wrappers.


Bugs: MESOS-6892
    https://issues.apache.org/jira/browse/MESOS-6892


Repository: mesos


Description (updated)
-------

`os::create_job` now returns a `Try<SharedHandle>` instead of a raw
`HANDLE`, forcing ownership of the job object handle onto the caller
of the function. `create_job` requires a `std::string name` for the
job object, which is mapped from a PID using `os::name_job`.

The assignment of a process to the job object is now done via
`Try<Nothing> os::assign_job(SharedHandle, pid_t)`.

The equivalent of killing a process tree with job object semantics
is simply to terminate the job object. This is done via
`os::kill_job(SharedHandle)`.


Diffs (updated)
-----

  3rdparty/stout/include/stout/windows/os.hpp 0bedb2d63f5b36afdac2b5a29986f38be96b7c16 


Diff: https://reviews.apache.org/r/56364/diff/4/

Changes: https://reviews.apache.org/r/56364/diff/3-4/


Testing
-------


Thanks,

Andrew Schwartzmeyer


Re: Review Request 56364: Windows: Stout: Rewrite Job Object wrappers.

Posted by Andrew Schwartzmeyer <an...@schwartzmeyer.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56364/
-----------------------------------------------------------

(Updated Feb. 12, 2017, 6:29 p.m.)


Review request for mesos, Alex Clemmer and Joseph Wu.


Changes
-------

Remove dependency on `hashmap` changes.


Bugs: MESOS-6892
    https://issues.apache.org/jira/browse/MESOS-6892


Repository: mesos


Description
-------

`os::create_job` now returns a `Try<SharedHandle>` instead of a raw
`HANDLE`, forcing ownership of the job object handle onto the caller
of the function. `create_job` requires a `std::string name` for the
job object, which is mapped from a PID using `os::name_job`.

The assignment of a process to the job object is now done via
`Try<Nothing> os::assign_job(SharedHandle, pid_t)`.

The equivalent of killing a process tree with job object semantics
is simply to terminate the job object. This is done via
`os::kill_job(SharedHandle)`.


Diffs
-----

  3rdparty/stout/include/stout/windows/os.hpp b5172fca96c4151f4b1ebb6d343022558f45fc34 

Diff: https://reviews.apache.org/r/56364/diff/


Testing
-------


Thanks,

Andrew Schwartzmeyer


Re: Review Request 56364: Windows: Stout: Rewrite Job Object wrappers.

Posted by Andrew Schwartzmeyer <an...@schwartzmeyer.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56364/
-----------------------------------------------------------

(Updated Feb. 8, 2017, 9:34 p.m.)


Review request for mesos, Alex Clemmer and Joseph Wu.


Changes
-------

Requested changes.


Bugs: MESOS-6892
    https://issues.apache.org/jira/browse/MESOS-6892


Repository: mesos


Description
-------

`os::create_job` now returns a `Try<SharedHandle>` instead of a raw
`HANDLE`, forcing ownership of the job object handle onto the caller
of the function. `create_job` requires a `std::string name` for the
job object, which is mapped from a PID using `os::name_job`.

The assignment of a process to the job object is now done via
`Try<Nothing> os::assign_job(SharedHandle, pid_t)`.

The equivalent of killing a process tree with job object semantics
is simply to terminate the job object. This is done via
`os::kill_job(SharedHandle)`.


Diffs (updated)
-----

  3rdparty/stout/include/stout/windows/os.hpp b5172fca96c4151f4b1ebb6d343022558f45fc34 

Diff: https://reviews.apache.org/r/56364/diff/


Testing
-------


Thanks,

Andrew Schwartzmeyer


Re: Review Request 56364: Windows: Stout: Rewrite Job Object wrappers.

Posted by Andrew Schwartzmeyer <an...@schwartzmeyer.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56364/
-----------------------------------------------------------

(Updated Feb. 8, 2017, 9:10 p.m.)


Review request for mesos, Alex Clemmer and Joseph Wu.


Changes
-------

Remove the patch dependency (there is none).


Bugs: MESOS-6892
    https://issues.apache.org/jira/browse/MESOS-6892


Repository: mesos


Description (updated)
-------

`os::create_job` now returns a `Try<SharedHandle>` instead of a raw
`HANDLE`, forcing ownership of the job object handle onto the caller
of the function. `create_job` requires a `std::string name` for the
job object, which is mapped from a PID using `os::name_job`.

The assignment of a process to the job object is now done via
`Try<Nothing> os::assign_job(SharedHandle, pid_t)`.

The equivalent of killing a process tree with job object semantics
is simply to terminate the job object. This is done via
`os::kill_job(SharedHandle)`.


Diffs (updated)
-----

  3rdparty/stout/include/stout/windows/os.hpp b5172fca96c4151f4b1ebb6d343022558f45fc34 

Diff: https://reviews.apache.org/r/56364/diff/


Testing
-------


Thanks,

Andrew Schwartzmeyer


Re: Review Request 56364: Windows: Stout: Rewrite Job Object wrappers.

Posted by Andrew Schwartzmeyer <an...@schwartzmeyer.com>.

> On Feb. 8, 2017, 2:26 a.m., Joseph Wu wrote:
> > 3rdparty/stout/include/stout/windows/os.hpp, line 729
> > <https://reviews.apache.org/r/56364/diff/1/?file=1625895#file1625895line729>
> >
> >     I believe using `BOOL` as a boolean results in a warning.  We often just compare it to `FALSE`.

Sure, I'll fix it for my changes. However, I'll note I was following earlier convention. These `BOOL`s are compared with a `!` all through the existing Windows OS code.


- Andrew


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56364/#review164619
-----------------------------------------------------------


On Feb. 8, 2017, 9:10 p.m., Andrew Schwartzmeyer wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56364/
> -----------------------------------------------------------
> 
> (Updated Feb. 8, 2017, 9:10 p.m.)
> 
> 
> Review request for mesos, Alex Clemmer and Joseph Wu.
> 
> 
> Bugs: MESOS-6892
>     https://issues.apache.org/jira/browse/MESOS-6892
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> `os::create_job` now returns a `Try<SharedHandle>` instead of a raw
> `HANDLE`, forcing ownership of the job object handle onto the caller
> of the function. `create_job` requires a `std::string name` for the
> job object, which is mapped from a PID using `os::name_job`.
> 
> The assignment of a process to the job object is now done via
> `Try<Nothing> os::assign_job(SharedHandle, pid_t)`.
> 
> The equivalent of killing a process tree with job object semantics
> is simply to terminate the job object. This is done via
> `os::kill_job(SharedHandle)`.
> 
> 
> Diffs
> -----
> 
>   3rdparty/stout/include/stout/windows/os.hpp b5172fca96c4151f4b1ebb6d343022558f45fc34 
> 
> Diff: https://reviews.apache.org/r/56364/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Andrew Schwartzmeyer
> 
>


Re: Review Request 56364: Windows: Stout: Rewrite Job Object wrappers.

Posted by Alex Clemmer <cl...@gmail.com>.

> On Feb. 8, 2017, 2:26 a.m., Joseph Wu wrote:
> > 3rdparty/stout/include/stout/windows/os.hpp, lines 718-721
> > <https://reviews.apache.org/r/56364/diff/1/?file=1625895#file1625895line718>
> >
> >     This is a good default.  But we need a way to toggle this behavior, such that the agent's death does not kill child jobs.
> >     
> >     i.e. A Windows version of ChildHook::SETSID

I asked Andy to follow up this patch with a different changeset that decouples the life of the executor from the life of the agent. Since the agent already kills all executors when it dies, I think it makes sense to have one set of patches just adding Mesos Containers support to Windows, and one fixing the semantics of a dying Agent.

Thoughts?


- Alex


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56364/#review164619
-----------------------------------------------------------


On Feb. 7, 2017, 2:31 a.m., Andrew Schwartzmeyer wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56364/
> -----------------------------------------------------------
> 
> (Updated Feb. 7, 2017, 2:31 a.m.)
> 
> 
> Review request for mesos, Alex Clemmer and Joseph Wu.
> 
> 
> Bugs: MESOS-6892
>     https://issues.apache.org/jira/browse/MESOS-6892
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> `os::create_job` now returns a `Try<SharedHandle>` instead of a raw
> `HANDLE`, forcing ownership of the job object handle onto the caller
> of the function. `create_job` requires a `std::string name` for the
> job object, which is mapped from a PID using `os::name_job`.
> 
> The assignment of a process to the job object is now done via
> `Try<Nothing> os::assign_job(SharedHandle, pid_t)`.
> 
> The equivalent of killing a process tree with job object semantics
> is simply to terminate the job object. This is done via
> `os::kill_job(SharedHandle)`.
> 
> 
> Diffs
> -----
> 
>   3rdparty/stout/include/stout/windows/os.hpp b5172fca96c4151f4b1ebb6d343022558f45fc34 
> 
> Diff: https://reviews.apache.org/r/56364/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Andrew Schwartzmeyer
> 
>


Re: Review Request 56364: Windows: Stout: Rewrite Job Object wrappers.

Posted by Alex Clemmer <cl...@gmail.com>.

> On Feb. 8, 2017, 2:26 a.m., Joseph Wu wrote:
> > I need to check how this is used in the rest of the review chain, but...
> > 
> > Giving the ownership of the HANDLE to the caller may require much larger changes in the codebase.  You may notice that we simply leak some pid's in some parts of the codebase.  So we have to make sure we aren't leaking these shared objects.

I'm not sure I understand, actually.

Just so we're on the same page, right now we leak the Job Object handles because they're set to kill the corresponding Job when the last handle is closed -- in other words, when the Agent dies. So any time the agent dies, all of our Executors die, too. :(

One of the goals of this patch is to set the stage so the agent and executor lifecycles _can_ be decoupled, so that when the agent dies, it can recover and reconnect to the running Executors instead of simply killing them all and restarting them.

This implies that the agent should be managing the lifecycle of the Job Objects, which in particular seems to imply that it is convenient to keep those handles (or _some_ sort of ID) as state.

Make sense?


- Alex


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56364/#review164619
-----------------------------------------------------------


On Feb. 7, 2017, 2:31 a.m., Andrew Schwartzmeyer wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56364/
> -----------------------------------------------------------
> 
> (Updated Feb. 7, 2017, 2:31 a.m.)
> 
> 
> Review request for mesos, Alex Clemmer and Joseph Wu.
> 
> 
> Bugs: MESOS-6892
>     https://issues.apache.org/jira/browse/MESOS-6892
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> `os::create_job` now returns a `Try<SharedHandle>` instead of a raw
> `HANDLE`, forcing ownership of the job object handle onto the caller
> of the function. `create_job` requires a `std::string name` for the
> job object, which is mapped from a PID using `os::name_job`.
> 
> The assignment of a process to the job object is now done via
> `Try<Nothing> os::assign_job(SharedHandle, pid_t)`.
> 
> The equivalent of killing a process tree with job object semantics
> is simply to terminate the job object. This is done via
> `os::kill_job(SharedHandle)`.
> 
> 
> Diffs
> -----
> 
>   3rdparty/stout/include/stout/windows/os.hpp b5172fca96c4151f4b1ebb6d343022558f45fc34 
> 
> Diff: https://reviews.apache.org/r/56364/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Andrew Schwartzmeyer
> 
>


Re: Review Request 56364: Windows: Stout: Rewrite Job Object wrappers.

Posted by Andrew Schwartzmeyer <an...@schwartzmeyer.com>.

> On Feb. 8, 2017, 2:26 a.m., Joseph Wu wrote:
> > I need to check how this is used in the rest of the review chain, but...
> > 
> > Giving the ownership of the HANDLE to the caller may require much larger changes in the codebase.  You may notice that we simply leak some pid's in some parts of the codebase.  So we have to make sure we aren't leaking these shared objects.
> 
> Alex Clemmer wrote:
>     I'm not sure I understand, actually.
>     
>     Just so we're on the same page, right now we leak the Job Object handles because they're set to kill the corresponding Job when the last handle is closed -- in other words, when the Agent dies. So any time the agent dies, all of our Executors die, too. :(
>     
>     One of the goals of this patch is to set the stage so the agent and executor lifecycles _can_ be decoupled, so that when the agent dies, it can recover and reconnect to the running Executors instead of simply killing them all and restarting them.
>     
>     This implies that the agent should be managing the lifecycle of the Job Objects, which in particular seems to imply that it is convenient to keep those handles (or _some_ sort of ID) as state.
>     
>     Make sense?

What Alex said correctly describes these changes. Instead of leaking the `HANDLE` such that the process implicitly obtains ownership of the job object (keeping it alive for the liftime of the executor), this patch makes this an explicit action by forcing the launcher to own the `SharedHandle` to the job object. The lifetime semantics have not changed; it's just been made explicit instead of implicit.

I need to note that:
> Giving the ownership of the HANDLE to the caller may require much larger changes in the codebase.

This is inaccurate. The original code already gave the caller ownership of the `HANDLE` (unsafely, and implicitly).

As for:
> So we have to make sure we aren't leaking these shared objects.

This is a valid concern. The `SharedHandle` needs to be owned explicitly. I believe I made this the case between this patch and https://reviews.apache.org/r/56366/; thus this is of utmost concern for review.


> On Feb. 8, 2017, 2:26 a.m., Joseph Wu wrote:
> > 3rdparty/stout/include/stout/windows/os.hpp, lines 718-721
> > <https://reviews.apache.org/r/56364/diff/1/?file=1625895#file1625895line718>
> >
> >     This is a good default.  But we need a way to toggle this behavior, such that the agent's death does not kill child jobs.
> >     
> >     i.e. A Windows version of ChildHook::SETSID
> 
> Alex Clemmer wrote:
>     I asked Andy to follow up this patch with a different changeset that decouples the life of the executor from the life of the agent. Since the agent already kills all executors when it dies, I think it makes sense to have one set of patches just adding Mesos Containers support to Windows, and one fixing the semantics of a dying Agent.
>     
>     Thoughts?

I don't think that making this behavior togglable belongs in this patch. I attempted to retain the existing lifecycle and behavior as closely as possible. If you note, in `recover()` there is a `TODO` that states we should attempt to reconnect to a possibly still running executor; this is not currently a possible scenario, but a later patch should enable this.


> On Feb. 8, 2017, 2:26 a.m., Joseph Wu wrote:
> > 3rdparty/stout/include/stout/windows/os.hpp, lines 740-744
> > <https://reviews.apache.org/r/56364/diff/1/?file=1625895#file1625895line740>
> >
> >     There's no `name` argument in this function...

Heh, yeah. This went through quite a few iterations ;) I'll fix, thanks.


- Andrew


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56364/#review164619
-----------------------------------------------------------


On Feb. 8, 2017, 9:10 p.m., Andrew Schwartzmeyer wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56364/
> -----------------------------------------------------------
> 
> (Updated Feb. 8, 2017, 9:10 p.m.)
> 
> 
> Review request for mesos, Alex Clemmer and Joseph Wu.
> 
> 
> Bugs: MESOS-6892
>     https://issues.apache.org/jira/browse/MESOS-6892
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> `os::create_job` now returns a `Try<SharedHandle>` instead of a raw
> `HANDLE`, forcing ownership of the job object handle onto the caller
> of the function. `create_job` requires a `std::string name` for the
> job object, which is mapped from a PID using `os::name_job`.
> 
> The assignment of a process to the job object is now done via
> `Try<Nothing> os::assign_job(SharedHandle, pid_t)`.
> 
> The equivalent of killing a process tree with job object semantics
> is simply to terminate the job object. This is done via
> `os::kill_job(SharedHandle)`.
> 
> 
> Diffs
> -----
> 
>   3rdparty/stout/include/stout/windows/os.hpp b5172fca96c4151f4b1ebb6d343022558f45fc34 
> 
> Diff: https://reviews.apache.org/r/56364/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Andrew Schwartzmeyer
> 
>


Re: Review Request 56364: Windows: Stout: Rewrite Job Object wrappers.

Posted by Joseph Wu <jo...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56364/#review164619
-----------------------------------------------------------



I need to check how this is used in the rest of the review chain, but...

Giving the ownership of the HANDLE to the caller may require much larger changes in the codebase.  You may notice that we simply leak some pid's in some parts of the codebase.  So we have to make sure we aren't leaking these shared objects.


3rdparty/stout/include/stout/windows/os.hpp (lines 710 - 713)
<https://reviews.apache.org/r/56364/#comment236387>

    This is a good default.  But we need a way to toggle this behavior, such that the agent's death does not kill child jobs.
    
    i.e. A Windows version of ChildHook::SETSID



3rdparty/stout/include/stout/windows/os.hpp (line 721)
<https://reviews.apache.org/r/56364/#comment236388>

    I believe using `BOOL` as a boolean results in a warning.  We often just compare it to `FALSE`.



3rdparty/stout/include/stout/windows/os.hpp (lines 730 - 734)
<https://reviews.apache.org/r/56364/#comment236389>

    There's no `name` argument in this function...


- Joseph Wu


On Feb. 6, 2017, 6:31 p.m., Andrew Schwartzmeyer wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56364/
> -----------------------------------------------------------
> 
> (Updated Feb. 6, 2017, 6:31 p.m.)
> 
> 
> Review request for mesos, Alex Clemmer and Joseph Wu.
> 
> 
> Bugs: MESOS-6892
>     https://issues.apache.org/jira/browse/MESOS-6892
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> `os::create_job` now returns a `Try<SharedHandle>` instead of a raw
> `HANDLE`, forcing ownership of the job object handle onto the caller
> of the function. `create_job` requires a `std::string name` for the
> job object, which is mapped from a PID using `os::name_job`.
> 
> The assignment of a process to the job object is now done via
> `Try<Nothing> os::assign_job(SharedHandle, pid_t)`.
> 
> The equivalent of killing a process tree with job object semantics
> is simply to terminate the job object. This is done via
> `os::kill_job(SharedHandle)`.
> 
> 
> Diffs
> -----
> 
>   3rdparty/stout/include/stout/windows/os.hpp b5172fca96c4151f4b1ebb6d343022558f45fc34 
> 
> Diff: https://reviews.apache.org/r/56364/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Andrew Schwartzmeyer
> 
>