You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Chun-Hung Hsiao <ch...@apache.org> on 2019/03/22 06:14:53 UTC

Re: Review Request 70216: Cleanup the recovery logic for refactoring SLRP.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70216/
-----------------------------------------------------------

(Updated March 22, 2019, 6:14 a.m.)


Review request for mesos, Benjamin Bannier, Jie Yu, and Jan Schlicht.


Changes
-------

Addressed Benjamin's comments and restructure the refactoring. Now it's done in SLRP first.


Summary (updated)
-----------------

Cleanup the recovery logic for refactoring SLRP.


Bugs: MESOS-9622
    https://issues.apache.org/jira/browse/MESOS-9622


Repository: mesos


Description (updated)
-------

In addition to perform volume state recovery, the `recoverVolumes`
function also recovers service manager and preparing services now. The
whole logic will be moved out from SLRP to v0 `VolumeManager` later.

During volume state recovery, we no longer recover all volumes to steady
states, since transient states are properly handled in SLRP. To simplify
the recovery logic, a `publishVolume` method that conforms to the volume
manager's `publishVolume` is introduced.


Diffs (updated)
-----

  src/resource_provider/storage/provider.cpp fea623c292158deb1b4b4b9ab1ac208031471519 
  src/resource_provider/storage/provider_process.hpp a5536b3d735e01eb1c4dc52d0602d973155f3c93 


Diff: https://reviews.apache.org/r/70216/diff/3/

Changes: https://reviews.apache.org/r/70216/diff/2-3/


Testing
-------

make check


Thanks,

Chun-Hung Hsiao


Re: Review Request 70216: Cleanup the recovery logic for refactoring SLRP.

Posted by Chun-Hung Hsiao <ch...@apache.org>.

> On March 25, 2019, 11:46 a.m., Benjamin Bannier wrote:
> > src/resource_provider/storage/provider.cpp
> > Line 618 (original)
> > <https://reviews.apache.org/r/70216/diff/4/?file=2133677#file2133677line707>
> >
> >     WE are we not checkpointing this anymore since it was always a no-op anyway, right?

Right. Checkpointing such states are not very useful, as they are meant to keep track of external interactions (i.e., making calls to the plugin). If another crash happens, the same recovery logic will put it back to `NODE_READY` again.


> On March 25, 2019, 11:46 a.m., Benjamin Bannier wrote:
> > src/resource_provider/storage/provider.cpp
> > Line 1736 (original), 1553 (patched)
> > <https://reviews.apache.org/r/70216/diff/4/?file=2133677#file2133677line1884>
> >
> >     Should we add a message, here and below?

This will be removed in later patches. Eventually `VolumeManager` would return a failure in all public-facing functions if `volumeId` is unknown.


> On March 25, 2019, 11:46 a.m., Benjamin Bannier wrote:
> > src/resource_provider/storage/provider.cpp
> > Line 1736 (original), 1553 (patched)
> > <https://reviews.apache.org/r/70216/diff/4/?file=2133677#file2133677line1884>
> >
> >     Should we add a message, here and below?

This will be removed in the next patch.


- Chun-Hung


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70216/#review213977
-----------------------------------------------------------


On March 27, 2019, 5:57 a.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70216/
> -----------------------------------------------------------
> 
> (Updated March 27, 2019, 5:57 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Jie Yu, and Jan Schlicht.
> 
> 
> Bugs: MESOS-9622
>     https://issues.apache.org/jira/browse/MESOS-9622
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> In addition to perform volume state recovery, the `recoverVolumes`
> function also recovers service manager and preparing services now. The
> whole logic will be moved out from SLRP to v0 `VolumeManager` later.
> 
> During volume state recovery, we no longer recover all volumes to steady
> states, since transient states are properly handled in SLRP. To simplify
> the recovery logic, a `publishVolume` method that conforms to the volume
> manager's `publishVolume` is introduced.
> 
> 
> Diffs
> -----
> 
>   src/resource_provider/storage/provider.cpp 2711503cdb58cb9b34af8c9fad0908c5f788a2af 
>   src/resource_provider/storage/provider_process.hpp a5536b3d735e01eb1c4dc52d0602d973155f3c93 
> 
> 
> Diff: https://reviews.apache.org/r/70216/diff/5/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>


Re: Review Request 70216: Cleanup the recovery logic for refactoring SLRP.

Posted by Benjamin Bannier <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70216/#review213977
-----------------------------------------------------------


Fix it, then Ship it!




Thanks for the patch reorganization, this has less noise now and is not as hard to understand anymore.


src/resource_provider/storage/provider.cpp
Lines 615 (patched)
<https://reviews.apache.org/r/70216/#comment300118>

    `[this]`



src/resource_provider/storage/provider.cpp
Line 618 (original)
<https://reviews.apache.org/r/70216/#comment300120>

    WE are we not checkpointing this anymore since it was always a no-op anyway, right?



src/resource_provider/storage/provider.cpp
Line 1736 (original), 1553 (patched)
<https://reviews.apache.org/r/70216/#comment300123>

    Should we add a message, here and below?



src/resource_provider/storage/provider.cpp
Lines 1559 (patched)
<https://reviews.apache.org/r/70216/#comment300122>

    `[this, volumeId]`


- Benjamin Bannier


On March 23, 2019, 12:31 a.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70216/
> -----------------------------------------------------------
> 
> (Updated March 23, 2019, 12:31 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Jie Yu, and Jan Schlicht.
> 
> 
> Bugs: MESOS-9622
>     https://issues.apache.org/jira/browse/MESOS-9622
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> In addition to perform volume state recovery, the `recoverVolumes`
> function also recovers service manager and preparing services now. The
> whole logic will be moved out from SLRP to v0 `VolumeManager` later.
> 
> During volume state recovery, we no longer recover all volumes to steady
> states, since transient states are properly handled in SLRP. To simplify
> the recovery logic, a `publishVolume` method that conforms to the volume
> manager's `publishVolume` is introduced.
> 
> 
> Diffs
> -----
> 
>   src/resource_provider/storage/provider.cpp fea623c292158deb1b4b4b9ab1ac208031471519 
>   src/resource_provider/storage/provider_process.hpp a5536b3d735e01eb1c4dc52d0602d973155f3c93 
> 
> 
> Diff: https://reviews.apache.org/r/70216/diff/4/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>


Re: Review Request 70216: Cleanup the recovery logic for refactoring SLRP.

Posted by Chun-Hung Hsiao <ch...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70216/
-----------------------------------------------------------

(Updated March 28, 2019, 7:55 a.m.)


Review request for mesos, Benjamin Bannier, Jie Yu, and Jan Schlicht.


Changes
-------

Addressed Benjamin's comments.


Bugs: MESOS-9622
    https://issues.apache.org/jira/browse/MESOS-9622


Repository: mesos


Description
-------

In addition to perform volume state recovery, the `recoverVolumes`
function also recovers service manager and preparing services now. The
whole logic will be moved out from SLRP to v0 `VolumeManager` later.

During volume state recovery, we no longer recover all volumes to steady
states, since transient states are properly handled in SLRP. To simplify
the recovery logic, a `publishVolume` method that conforms to the volume
manager's `publishVolume` is introduced.


Diffs (updated)
-----

  src/resource_provider/storage/provider.cpp 2711503cdb58cb9b34af8c9fad0908c5f788a2af 
  src/resource_provider/storage/provider_process.hpp a5536b3d735e01eb1c4dc52d0602d973155f3c93 


Diff: https://reviews.apache.org/r/70216/diff/6/

Changes: https://reviews.apache.org/r/70216/diff/5-6/


Testing
-------

make check


Thanks,

Chun-Hung Hsiao


Re: Review Request 70216: Cleanup the recovery logic for refactoring SLRP.

Posted by Chun-Hung Hsiao <ch...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70216/
-----------------------------------------------------------

(Updated March 27, 2019, 5:57 a.m.)


Review request for mesos, Benjamin Bannier, Jie Yu, and Jan Schlicht.


Changes
-------

Rebased.


Bugs: MESOS-9622
    https://issues.apache.org/jira/browse/MESOS-9622


Repository: mesos


Description
-------

In addition to perform volume state recovery, the `recoverVolumes`
function also recovers service manager and preparing services now. The
whole logic will be moved out from SLRP to v0 `VolumeManager` later.

During volume state recovery, we no longer recover all volumes to steady
states, since transient states are properly handled in SLRP. To simplify
the recovery logic, a `publishVolume` method that conforms to the volume
manager's `publishVolume` is introduced.


Diffs (updated)
-----

  src/resource_provider/storage/provider.cpp 2711503cdb58cb9b34af8c9fad0908c5f788a2af 
  src/resource_provider/storage/provider_process.hpp a5536b3d735e01eb1c4dc52d0602d973155f3c93 


Diff: https://reviews.apache.org/r/70216/diff/5/

Changes: https://reviews.apache.org/r/70216/diff/4-5/


Testing
-------

make check


Thanks,

Chun-Hung Hsiao


Re: Review Request 70216: Cleanup the recovery logic for refactoring SLRP.

Posted by Chun-Hung Hsiao <ch...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70216/
-----------------------------------------------------------

(Updated March 22, 2019, 11:31 p.m.)


Review request for mesos, Benjamin Bannier, Jie Yu, and Jan Schlicht.


Changes
-------

Inlined an internal recovery helper so the recovery logic is in one place.


Bugs: MESOS-9622
    https://issues.apache.org/jira/browse/MESOS-9622


Repository: mesos


Description
-------

In addition to perform volume state recovery, the `recoverVolumes`
function also recovers service manager and preparing services now. The
whole logic will be moved out from SLRP to v0 `VolumeManager` later.

During volume state recovery, we no longer recover all volumes to steady
states, since transient states are properly handled in SLRP. To simplify
the recovery logic, a `publishVolume` method that conforms to the volume
manager's `publishVolume` is introduced.


Diffs (updated)
-----

  src/resource_provider/storage/provider.cpp fea623c292158deb1b4b4b9ab1ac208031471519 
  src/resource_provider/storage/provider_process.hpp a5536b3d735e01eb1c4dc52d0602d973155f3c93 


Diff: https://reviews.apache.org/r/70216/diff/4/

Changes: https://reviews.apache.org/r/70216/diff/3-4/


Testing
-------

make check


Thanks,

Chun-Hung Hsiao