You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by James Peach <jp...@apache.org> on 2016/04/04 19:31:19 UTC

Review Request 45689: Make Cluster::Slave more tolerant of start failures.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45689/
-----------------------------------------------------------

Review request for mesos, Joris Van Remoortere, Joseph Wu, and Jiang Yan Xu.


Repository: mesos


Description
-------

If Cluster::Slave::start() fails, make sure we don't crash in the
destructor.


Diffs
-----

  src/tests/cluster.cpp 14d0d34fcb4c408ad996672394c39c84fd2be918 

Diff: https://reviews.apache.org/r/45689/diff/


Testing
-------

Make check.


Thanks,

James Peach


Re: Review Request 45689: Make Cluster::Slave more tolerant of start failures.

Posted by James Peach <jp...@apache.org>.

> On April 6, 2016, 12:18 a.m., Jiang Yan Xu wrote:
> > src/tests/cluster.hpp, line 188
> > <https://reviews.apache.org/r/45689/diff/2/?file=1327107#file1327107line188>
> >
> >     These are default initialized anyways but explicitness is good.

There is a non-default constructor and these had no list initialization or the direct initialization, so AFAICT they get a trivial initializer (ie. nothing). I made this change because the ``containerizer`` pointer ended up non-null and invalid.


- James


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45689/#review127259
-----------------------------------------------------------


On April 5, 2016, 11:41 p.m., James Peach wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45689/
> -----------------------------------------------------------
> 
> (Updated April 5, 2016, 11:41 p.m.)
> 
> 
> Review request for mesos, Joris Van Remoortere, Joseph Wu, and Jiang Yan Xu.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> If Cluster::Slave::start() fails, make sure we don't crash in the
> destructor.
> 
> 
> Diffs
> -----
> 
>   src/tests/cluster.hpp 91d3998176115002ea22a0825b1aaaee3fe0d768 
>   src/tests/cluster.cpp eefc2fa55bca1ad6a53047046fa4f5996d5c3fef 
> 
> Diff: https://reviews.apache.org/r/45689/diff/
> 
> 
> Testing
> -------
> 
> Make check.
> 
> 
> Thanks,
> 
> James Peach
> 
>


Re: Review Request 45689: Make Cluster::Slave more tolerant of start failures.

Posted by Jiang Yan Xu <ya...@jxu.me>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45689/#review127259
-----------------------------------------------------------


Ship it!





src/tests/cluster.hpp (line 188)
<https://reviews.apache.org/r/45689/#comment190480>

    These are default initialized anyways but explicitness is good.


- Jiang Yan Xu


On April 5, 2016, 4:41 p.m., James Peach wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45689/
> -----------------------------------------------------------
> 
> (Updated April 5, 2016, 4:41 p.m.)
> 
> 
> Review request for mesos, Joris Van Remoortere, Joseph Wu, and Jiang Yan Xu.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> If Cluster::Slave::start() fails, make sure we don't crash in the
> destructor.
> 
> 
> Diffs
> -----
> 
>   src/tests/cluster.hpp 91d3998176115002ea22a0825b1aaaee3fe0d768 
>   src/tests/cluster.cpp eefc2fa55bca1ad6a53047046fa4f5996d5c3fef 
> 
> Diff: https://reviews.apache.org/r/45689/diff/
> 
> 
> Testing
> -------
> 
> Make check.
> 
> 
> Thanks,
> 
> James Peach
> 
>


Re: Review Request 45689: Make Cluster::Slave more tolerant of start failures.

Posted by James Peach <jp...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45689/
-----------------------------------------------------------

(Updated April 5, 2016, 11:41 p.m.)


Review request for mesos, Joris Van Remoortere, Joseph Wu, and Jiang Yan Xu.


Changes
-------

Rebased and addressed review comments.


Repository: mesos


Description
-------

If Cluster::Slave::start() fails, make sure we don't crash in the
destructor.


Diffs (updated)
-----

  src/tests/cluster.hpp 91d3998176115002ea22a0825b1aaaee3fe0d768 
  src/tests/cluster.cpp eefc2fa55bca1ad6a53047046fa4f5996d5c3fef 

Diff: https://reviews.apache.org/r/45689/diff/


Testing
-------

Make check.


Thanks,

James Peach


Re: Review Request 45689: Make Cluster::Slave more tolerant of start failures.

Posted by Jiang Yan Xu <ya...@jxu.me>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45689/#review126867
-----------------------------------------------------------



Not insisting on addressing it here but just would like to mention that we'd better not use assertions in here because we want to clean up all containers and not terminate early.

This can be done by addressing this [TODO](https://github.com/apache/mesos/blob/f921302b0a0d7c37ba1205ad12857ade614d04ff/3rdparty/libprocess/include/process/gtest.hpp#L206) and using it:
```
TODO(bmahler): Differentiate EXPECT and ASSERT here.
```

- Jiang Yan Xu


On April 4, 2016, 10:31 a.m., James Peach wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45689/
> -----------------------------------------------------------
> 
> (Updated April 4, 2016, 10:31 a.m.)
> 
> 
> Review request for mesos, Joris Van Remoortere, Joseph Wu, and Jiang Yan Xu.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> If Cluster::Slave::start() fails, make sure we don't crash in the
> destructor.
> 
> 
> Diffs
> -----
> 
>   src/tests/cluster.cpp 14d0d34fcb4c408ad996672394c39c84fd2be918 
> 
> Diff: https://reviews.apache.org/r/45689/diff/
> 
> 
> Testing
> -------
> 
> Make check.
> 
> 
> Thanks,
> 
> James Peach
> 
>


Re: Review Request 45689: Make Cluster::Slave more tolerant of start failures.

Posted by Jiang Yan Xu <ya...@jxu.me>.

> On April 4, 2016, 10:35 a.m., haosdent huang wrote:
> > src/tests/cluster.cpp, line 437
> > <https://reviews.apache.org/r/45689/diff/1/?file=1324705#file1324705line437>
> >
> >     how about
> >     
> >     ```
> >     if (!containerizer) {
> >       return;
> >     }
> >     ```

+1 this is better.


- Jiang Yan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45689/#review126866
-----------------------------------------------------------


On April 4, 2016, 10:31 a.m., James Peach wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45689/
> -----------------------------------------------------------
> 
> (Updated April 4, 2016, 10:31 a.m.)
> 
> 
> Review request for mesos, Joris Van Remoortere, Joseph Wu, and Jiang Yan Xu.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> If Cluster::Slave::start() fails, make sure we don't crash in the
> destructor.
> 
> 
> Diffs
> -----
> 
>   src/tests/cluster.cpp 14d0d34fcb4c408ad996672394c39c84fd2be918 
> 
> Diff: https://reviews.apache.org/r/45689/diff/
> 
> 
> Testing
> -------
> 
> Make check.
> 
> 
> Thanks,
> 
> James Peach
> 
>


Re: Review Request 45689: Make Cluster::Slave more tolerant of start failures.

Posted by James Peach <jp...@apache.org>.

> On April 4, 2016, 5:35 p.m., haosdent huang wrote:
> > src/tests/cluster.cpp, line 437
> > <https://reviews.apache.org/r/45689/diff/1/?file=1324705#file1324705line437>
> >
> >     how about
> >     
> >     ```
> >     if (!containerizer) {
> >       return;
> >     }
> >     ```
> 
> Jiang Yan Xu wrote:
>     +1 this is better.
> 
> James Peach wrote:
>     You can't early return because you have to call ``terminate()``.
> 
> Jiang Yan Xu wrote:
>     Alright, overlooked it :) However it looks like the lines above this suffer from the same problem:
>     
>     ```
>     if (!cleanUpContainersInDestructor) {
>       return;
>     }
>     ```
>     
>     In generl I think this is OK:
>     
>     
>     How about 
>     
>     ```
>     if (!containerizer || !cleanUpContainersInDestructor) {
>       terminate();
>       return;
>     }
>     
>     ...
>     ```

AFAICT skipping cleanup tasks when ``cleanUpContainersInDestructor`` is ``false`` is by design.


- James


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45689/#review126866
-----------------------------------------------------------


On April 4, 2016, 5:31 p.m., James Peach wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45689/
> -----------------------------------------------------------
> 
> (Updated April 4, 2016, 5:31 p.m.)
> 
> 
> Review request for mesos, Joris Van Remoortere, Joseph Wu, and Jiang Yan Xu.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> If Cluster::Slave::start() fails, make sure we don't crash in the
> destructor.
> 
> 
> Diffs
> -----
> 
>   src/tests/cluster.cpp 14d0d34fcb4c408ad996672394c39c84fd2be918 
> 
> Diff: https://reviews.apache.org/r/45689/diff/
> 
> 
> Testing
> -------
> 
> Make check.
> 
> 
> Thanks,
> 
> James Peach
> 
>


Re: Review Request 45689: Make Cluster::Slave more tolerant of start failures.

Posted by Joseph Wu <jo...@mesosphere.io>.

> On April 4, 2016, 10:35 a.m., haosdent huang wrote:
> > src/tests/cluster.cpp, line 437
> > <https://reviews.apache.org/r/45689/diff/1/?file=1324705#file1324705line437>
> >
> >     how about
> >     
> >     ```
> >     if (!containerizer) {
> >       return;
> >     }
> >     ```
> 
> Jiang Yan Xu wrote:
>     +1 this is better.
> 
> James Peach wrote:
>     You can't early return because you have to call ``terminate()``.
> 
> Jiang Yan Xu wrote:
>     Alright, overlooked it :) However it looks like the lines above this suffer from the same problem:
>     
>     ```
>     if (!cleanUpContainersInDestructor) {
>       return;
>     }
>     ```
>     
>     In generl I think this is OK:
>     
>     
>     How about 
>     
>     ```
>     if (!containerizer || !cleanUpContainersInDestructor) {
>       terminate();
>       return;
>     }
>     
>     ...
>     ```
> 
> James Peach wrote:
>     AFAICT skipping cleanup tasks when ``cleanUpContainersInDestructor`` is ``false`` is by design.

Haosdent's suggestion works in this case.  See explanation in my review posted below.


- Joseph


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45689/#review126866
-----------------------------------------------------------


On April 4, 2016, 10:31 a.m., James Peach wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45689/
> -----------------------------------------------------------
> 
> (Updated April 4, 2016, 10:31 a.m.)
> 
> 
> Review request for mesos, Joris Van Remoortere, Joseph Wu, and Jiang Yan Xu.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> If Cluster::Slave::start() fails, make sure we don't crash in the
> destructor.
> 
> 
> Diffs
> -----
> 
>   src/tests/cluster.cpp 14d0d34fcb4c408ad996672394c39c84fd2be918 
> 
> Diff: https://reviews.apache.org/r/45689/diff/
> 
> 
> Testing
> -------
> 
> Make check.
> 
> 
> Thanks,
> 
> James Peach
> 
>


Re: Review Request 45689: Make Cluster::Slave more tolerant of start failures.

Posted by James Peach <jp...@apache.org>.

> On April 4, 2016, 5:35 p.m., haosdent huang wrote:
> > src/tests/cluster.cpp, line 437
> > <https://reviews.apache.org/r/45689/diff/1/?file=1324705#file1324705line437>
> >
> >     how about
> >     
> >     ```
> >     if (!containerizer) {
> >       return;
> >     }
> >     ```
> 
> Jiang Yan Xu wrote:
>     +1 this is better.

You can't early return because you have to call ``terminate()``.


- James


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45689/#review126866
-----------------------------------------------------------


On April 4, 2016, 5:31 p.m., James Peach wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45689/
> -----------------------------------------------------------
> 
> (Updated April 4, 2016, 5:31 p.m.)
> 
> 
> Review request for mesos, Joris Van Remoortere, Joseph Wu, and Jiang Yan Xu.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> If Cluster::Slave::start() fails, make sure we don't crash in the
> destructor.
> 
> 
> Diffs
> -----
> 
>   src/tests/cluster.cpp 14d0d34fcb4c408ad996672394c39c84fd2be918 
> 
> Diff: https://reviews.apache.org/r/45689/diff/
> 
> 
> Testing
> -------
> 
> Make check.
> 
> 
> Thanks,
> 
> James Peach
> 
>


Re: Review Request 45689: Make Cluster::Slave more tolerant of start failures.

Posted by Jiang Yan Xu <ya...@jxu.me>.

> On April 4, 2016, 10:35 a.m., haosdent huang wrote:
> > src/tests/cluster.cpp, line 437
> > <https://reviews.apache.org/r/45689/diff/1/?file=1324705#file1324705line437>
> >
> >     how about
> >     
> >     ```
> >     if (!containerizer) {
> >       return;
> >     }
> >     ```
> 
> Jiang Yan Xu wrote:
>     +1 this is better.
> 
> James Peach wrote:
>     You can't early return because you have to call ``terminate()``.

Alright, overlooked it :) However it looks like the lines above this suffer from the same problem:

```
if (!cleanUpContainersInDestructor) {
  return;
}
```

In generl I think this is OK:


How about 

```
if (!containerizer || !cleanUpContainersInDestructor) {
  terminate();
  return;
}

...
```


- Jiang Yan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45689/#review126866
-----------------------------------------------------------


On April 4, 2016, 10:31 a.m., James Peach wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45689/
> -----------------------------------------------------------
> 
> (Updated April 4, 2016, 10:31 a.m.)
> 
> 
> Review request for mesos, Joris Van Remoortere, Joseph Wu, and Jiang Yan Xu.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> If Cluster::Slave::start() fails, make sure we don't crash in the
> destructor.
> 
> 
> Diffs
> -----
> 
>   src/tests/cluster.cpp 14d0d34fcb4c408ad996672394c39c84fd2be918 
> 
> Diff: https://reviews.apache.org/r/45689/diff/
> 
> 
> Testing
> -------
> 
> Make check.
> 
> 
> Thanks,
> 
> James Peach
> 
>


Re: Review Request 45689: Make Cluster::Slave more tolerant of start failures.

Posted by haosdent huang <ha...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45689/#review126866
-----------------------------------------------------------




src/tests/cluster.cpp (line 437)
<https://reviews.apache.org/r/45689/#comment189956>

    how about
    
    ```
    if (!containerizer) {
      return;
    }
    ```


- haosdent huang


On April 4, 2016, 5:31 p.m., James Peach wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45689/
> -----------------------------------------------------------
> 
> (Updated April 4, 2016, 5:31 p.m.)
> 
> 
> Review request for mesos, Joris Van Remoortere, Joseph Wu, and Jiang Yan Xu.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> If Cluster::Slave::start() fails, make sure we don't crash in the
> destructor.
> 
> 
> Diffs
> -----
> 
>   src/tests/cluster.cpp 14d0d34fcb4c408ad996672394c39c84fd2be918 
> 
> Diff: https://reviews.apache.org/r/45689/diff/
> 
> 
> Testing
> -------
> 
> Make check.
> 
> 
> Thanks,
> 
> James Peach
> 
>


Re: Review Request 45689: Make Cluster::Slave more tolerant of start failures.

Posted by Jiang Yan Xu <ya...@jxu.me>.

> On April 5, 2016, 12:52 p.m., Joseph Wu wrote:
> > src/tests/cluster.cpp, lines 364-366
> > <https://reviews.apache.org/r/45689/diff/1/?file=1324705#file1324705line364>
> >
> >     The destructor will only dereference a null `containerizer` if this error case is hit (or if you pass in a `nullptr` cast to `slave::Containerizer*`).
> >     
> >     At this point in the code, we haven't done anything other than create some objects.  So it should be ok to do something like this in the destructor:
> >     ```
> >     if (containerizer == nullptr) {
> >       return;
> >     }
> >     ```
> >     The call to `terminate()` only matters if the `cluster::Slave` is created successfully.

I agree that Haosdent's suggestion works for the purpose of this fix. But if "The call to terminate() only matters if the cluster::Slave is created successfully", then we should call terminate() under the condition that `pid` is not empty in `~Slave()`. We should fix it in a separate review.


- Jiang Yan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45689/#review127195
-----------------------------------------------------------


On April 4, 2016, 10:31 a.m., James Peach wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45689/
> -----------------------------------------------------------
> 
> (Updated April 4, 2016, 10:31 a.m.)
> 
> 
> Review request for mesos, Joris Van Remoortere, Joseph Wu, and Jiang Yan Xu.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> If Cluster::Slave::start() fails, make sure we don't crash in the
> destructor.
> 
> 
> Diffs
> -----
> 
>   src/tests/cluster.cpp 14d0d34fcb4c408ad996672394c39c84fd2be918 
> 
> Diff: https://reviews.apache.org/r/45689/diff/
> 
> 
> Testing
> -------
> 
> Make check.
> 
> 
> Thanks,
> 
> James Peach
> 
>


Re: Review Request 45689: Make Cluster::Slave more tolerant of start failures.

Posted by Joseph Wu <jo...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45689/#review127195
-----------------------------------------------------------




src/tests/cluster.cpp (lines 364 - 366)
<https://reviews.apache.org/r/45689/#comment190367>

    The destructor will only dereference a null `containerizer` if this error case is hit (or if you pass in a `nullptr` cast to `slave::Containerizer*`).
    
    At this point in the code, we haven't done anything other than create some objects.  So it should be ok to do something like this in the destructor:
    ```
    if (containerizer == nullptr) {
      return;
    }
    ```
    The call to `terminate()` only matters if the `cluster::Slave` is created successfully.


- Joseph Wu


On April 4, 2016, 10:31 a.m., James Peach wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45689/
> -----------------------------------------------------------
> 
> (Updated April 4, 2016, 10:31 a.m.)
> 
> 
> Review request for mesos, Joris Van Remoortere, Joseph Wu, and Jiang Yan Xu.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> If Cluster::Slave::start() fails, make sure we don't crash in the
> destructor.
> 
> 
> Diffs
> -----
> 
>   src/tests/cluster.cpp 14d0d34fcb4c408ad996672394c39c84fd2be918 
> 
> Diff: https://reviews.apache.org/r/45689/diff/
> 
> 
> Testing
> -------
> 
> Make check.
> 
> 
> Thanks,
> 
> James Peach
> 
>