You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Gilbert Song <so...@gmail.com> on 2017/02/18 01:54:36 UTC

Review Request 56808: Fixed nested container agent flapping issue after reboot.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56808/
-----------------------------------------------------------

Review request for mesos, Avinash sridharan, Jie Yu, and Kevin Klues.


Repository: mesos


Description
-------

Fixed nested container agent flapping issue after reboot.


Diffs
-----

  src/slave/containerizer/mesos/provisioner/provisioner.hpp ff52e35ac44fea70fe2031e6ac537c613c57f50f 
  src/slave/containerizer/mesos/provisioner/provisioner.cpp 8c20d4077859d437999467d045dfd500c1e576dd 

Diff: https://reviews.apache.org/r/56808/diff/


Testing
-------

make check


Thanks,

Gilbert Song


Re: Review Request 56808: Fixed nested container agent flapping issue after reboot.

Posted by Gilbert Song <so...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56808/
-----------------------------------------------------------

(Updated Feb. 21, 2017, 2:34 p.m.)


Review request for mesos, Avinash sridharan, Jie Yu, and Kevin Klues.


Bugs: MESOS-7152
    https://issues.apache.org/jira/browse/MESOS-7152


Repository: mesos


Description
-------

When recovering containers in provisioner, there is a particular case
that after the machine reboots the container runtime directory and
slave state is gone but the provisioner directory still exists since
it is under the agent work_dir(e.g., agent work_dir is /var/lib/mesos).
Then, all checkpointed containers will be cleaned up as unknown
containers in provisioner during recovery. However, the semantic that
a child container is always cleaned up before its parent container
cannot be guaranteed for this particular case. Ideally, we should
put the provisioner directory under the container runtime dir but this
is not backward compactible. It is an unfortunate that we have to
make the provisioner::destroy() to be recursive.


Diffs
-----

  src/slave/containerizer/mesos/provisioner/provisioner.hpp ff52e35ac44fea70fe2031e6ac537c613c57f50f 
  src/slave/containerizer/mesos/provisioner/provisioner.cpp 8c20d4077859d437999467d045dfd500c1e576dd 

Diff: https://reviews.apache.org/r/56808/diff/


Testing
-------

make check


Thanks,

Gilbert Song


Re: Review Request 56808: Fixed nested container agent flapping issue after reboot.

Posted by Gilbert Song <so...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56808/
-----------------------------------------------------------

(Updated Feb. 21, 2017, 1:54 p.m.)


Review request for mesos, Avinash sridharan, Jie Yu, and Kevin Klues.


Repository: mesos


Description (updated)
-------

When recovering containers in provisioner, there is a particular case
that after the machine reboots the container runtime directory and
slave state is gone but the provisioner directory still exists since
it is under the agent work_dir(e.g., agent work_dir is /var/lib/mesos).
Then, all checkpointed containers will be cleaned up as unknown
containers in provisioner during recovery. However, the semantic that
a child container is always cleaned up before its parent container
cannot be guaranteed for this particular case. Ideally, we should
put the provisioner directory under the container runtime dir but this
is not backward compactible. It is an unfortunate that we have to
make the provisioner::destroy() to be recursive.


Diffs (updated)
-----

  src/slave/containerizer/mesos/provisioner/provisioner.hpp ff52e35ac44fea70fe2031e6ac537c613c57f50f 
  src/slave/containerizer/mesos/provisioner/provisioner.cpp 8c20d4077859d437999467d045dfd500c1e576dd 

Diff: https://reviews.apache.org/r/56808/diff/


Testing
-------

make check


Thanks,

Gilbert Song


Re: Review Request 56808: Fixed nested container agent flapping issue after reboot.

Posted by Gilbert Song <so...@gmail.com>.

> On Feb. 19, 2017, 8:22 p.m., Jie Yu wrote:
> > src/slave/containerizer/mesos/provisioner/provisioner.hpp, line 134
> > <https://reviews.apache.org/r/56808/diff/1/?file=1639895#file1639895line134>
> >
> >     Any reason you need this state? Can 'state' just been a boolean (i.e., `destorying`)?

Thought about making it just as a boolean previously. Considering it is possible that some other status may be useful in the future so that I introduced an enum with two initialized status `PROVISIONING` and `DESTROYING`.

Fixed depending on your comment. Please let me know if you change your mind and let's change it back.


- Gilbert


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56808/#review166046
-----------------------------------------------------------


On Feb. 21, 2017, 1:54 p.m., Gilbert Song wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56808/
> -----------------------------------------------------------
> 
> (Updated Feb. 21, 2017, 1:54 p.m.)
> 
> 
> Review request for mesos, Avinash sridharan, Jie Yu, and Kevin Klues.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> When recovering containers in provisioner, there is a particular case
> that after the machine reboots the container runtime directory and
> slave state is gone but the provisioner directory still exists since
> it is under the agent work_dir(e.g., agent work_dir is /var/lib/mesos).
> Then, all checkpointed containers will be cleaned up as unknown
> containers in provisioner during recovery. However, the semantic that
> a child container is always cleaned up before its parent container
> cannot be guaranteed for this particular case. Ideally, we should
> put the provisioner directory under the container runtime dir but this
> is not backward compactible. It is an unfortunate that we have to
> make the provisioner::destroy() to be recursive.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/provisioner/provisioner.hpp ff52e35ac44fea70fe2031e6ac537c613c57f50f 
>   src/slave/containerizer/mesos/provisioner/provisioner.cpp 8c20d4077859d437999467d045dfd500c1e576dd 
> 
> Diff: https://reviews.apache.org/r/56808/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Gilbert Song
> 
>


Re: Review Request 56808: Fixed nested container agent flapping issue after reboot.

Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56808/#review166046
-----------------------------------------------------------


Fix it, then Ship it!





src/slave/containerizer/mesos/provisioner/provisioner.hpp (line 134)
<https://reviews.apache.org/r/56808/#comment237949>

    Any reason you need this state? Can 'state' just been a boolean (i.e., `destorying`)?



src/slave/containerizer/mesos/provisioner/provisioner.cpp (line 498)
<https://reviews.apache.org/r/56808/#comment237950>

    Why remove case (1) in the original comment? I think case 1) and case 2) in the original comment are for different cases.



src/slave/containerizer/mesos/provisioner/provisioner.cpp (line 547)
<https://reviews.apache.org/r/56808/#comment237951>

    Not yours, please use const ref here.


- Jie Yu


On Feb. 18, 2017, 1:54 a.m., Gilbert Song wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56808/
> -----------------------------------------------------------
> 
> (Updated Feb. 18, 2017, 1:54 a.m.)
> 
> 
> Review request for mesos, Avinash sridharan, Jie Yu, and Kevin Klues.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Fixed nested container agent flapping issue after reboot.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/provisioner/provisioner.hpp ff52e35ac44fea70fe2031e6ac537c613c57f50f 
>   src/slave/containerizer/mesos/provisioner/provisioner.cpp 8c20d4077859d437999467d045dfd500c1e576dd 
> 
> Diff: https://reviews.apache.org/r/56808/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Gilbert Song
> 
>