You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Ilya Pronin <ip...@twopensource.com> on 2017/09/13 16:40:47 UTC

Review Request 62286: Added recoverMissing log process.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62286/
-----------------------------------------------------------

Review request for mesos and Jie Yu.


Bugs: MESOS-7973
    https://issues.apache.org/jira/browse/MESOS-7973


Repository: mesos


Description
-------

The process is used to "recover" a non-leading VOTING replica by running
the recovery protocol to find current begin and end positions of the log
and catching-up positions that are missing on the replica. This allows
following replicas to serve eventually consistent reads.


Diffs
-----

  src/log/recover.hpp 94d6bb283ea82df607fcc72de05cda3b12ad5d2d 
  src/log/recover.cpp b366818767adb259ee6805f9978fd330b790a264 
  src/tests/log_tests.cpp f9f9400c901152779ae0ebfe74cf8f7aac1d3396 


Diff: https://reviews.apache.org/r/62286/diff/1/


Testing
-------

Added tests that verify that the new recovery process correctly performs and produces meaningful result under various circumstances (recovered positions were truncated, replica was lagging far behind). Ran `make check`.


Thanks,

Ilya Pronin


Re: Review Request 62286: Added CatchupMissing log process.

Posted by Ilya Pronin <ip...@twopensource.com>.

> On Sept. 25, 2017, 5:36 a.m., Jie Yu wrote:
> > src/log/recover.cpp
> > Lines 652 (patched)
> > <https://reviews.apache.org/r/62286/diff/1/?file=1820908#file1820908line652>
> >
> >     Putting this in `recover.cpp|hpp` is very wierd. I am leaning towards moving this too `catchup.hpp|cpp` and just overload `catchup` method (without positions).

Done!


> On Sept. 25, 2017, 5:36 a.m., Jie Yu wrote:
> > src/log/recover.cpp
> > Lines 701 (patched)
> > <https://reviews.apache.org/r/62286/diff/1/?file=1820908#file1820908line701>
> >
> >     Any reason not returnning a failure in this case?

No. Fixed.


> On Sept. 25, 2017, 5:36 a.m., Jie Yu wrote:
> > src/log/recover.cpp
> > Lines 711 (patched)
> > <https://reviews.apache.org/r/62286/diff/1/?file=1820908#file1820908line711>
> >
> >     Can you add a few comments on this. THis is a bit hacky because we can essentially hijacting the recovery protocol. Ideally, we probably should split the recovery into several phases and uses on the phase that's relevant to this.

Per our offline discussion, exposed `runRecoverProtocol()` in `recover.hpp` and moved retry logic from `RecoverProtocolProcess` to `RecoverProcess`.


> On Sept. 25, 2017, 5:36 a.m., Jie Yu wrote:
> > src/log/recover.cpp
> > Lines 730 (patched)
> > <https://reviews.apache.org/r/62286/diff/1/?file=1820908#file1820908line730>
> >
> >     Can you explain why we need to recover the same `begin` after a restart?

Discussed offline.


- Ilya


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62286/#review186084
-----------------------------------------------------------


On Oct. 4, 2017, 12:16 a.m., Ilya Pronin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62286/
> -----------------------------------------------------------
> 
> (Updated Oct. 4, 2017, 12:16 a.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Bugs: MESOS-7973
>     https://issues.apache.org/jira/browse/MESOS-7973
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The process is used to catch-up a non-leading VOTING replica by running
> the recovery protocol to find current begin and end positions of the log
> and catching-up positions that are missing on the replica. This allows
> following replicas to serve eventually consistent reads.
> 
> 
> Diffs
> -----
> 
>   src/log/catchup.hpp 123bc7a57e5e89f9ba75c36ba0cbe5ead807c518 
>   src/log/catchup.cpp 94e1b00db2cd9d5a2368a979c1fd155bb6cac1f2 
>   src/tests/log_tests.cpp f9f9400c901152779ae0ebfe74cf8f7aac1d3396 
> 
> 
> Diff: https://reviews.apache.org/r/62286/diff/2/
> 
> 
> Testing
> -------
> 
> Added tests that verify that the new recovery process correctly performs and produces meaningful result under various circumstances (recovered positions were truncated, replica was lagging far behind). Ran `make check`.
> 
> 
> Thanks,
> 
> Ilya Pronin
> 
>


Re: Review Request 62286: Added recoverMissing log process.

Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62286/#review186084
-----------------------------------------------------------




src/log/recover.cpp
Lines 652 (patched)
<https://reviews.apache.org/r/62286/#comment262513>

    Putting this in `recover.cpp|hpp` is very wierd. I am leaning towards moving this too `catchup.hpp|cpp` and just overload `catchup` method (without positions).



src/log/recover.cpp
Lines 701 (patched)
<https://reviews.apache.org/r/62286/#comment262510>

    Any reason not returnning a failure in this case?



src/log/recover.cpp
Lines 711 (patched)
<https://reviews.apache.org/r/62286/#comment262512>

    Can you add a few comments on this. THis is a bit hacky because we can essentially hijacting the recovery protocol. Ideally, we probably should split the recovery into several phases and uses on the phase that's relevant to this.



src/log/recover.cpp
Lines 730 (patched)
<https://reviews.apache.org/r/62286/#comment262511>

    Can you explain why we need to recover the same `begin` after a restart?


- Jie Yu


On Sept. 13, 2017, 4:40 p.m., Ilya Pronin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62286/
> -----------------------------------------------------------
> 
> (Updated Sept. 13, 2017, 4:40 p.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Bugs: MESOS-7973
>     https://issues.apache.org/jira/browse/MESOS-7973
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The process is used to "recover" a non-leading VOTING replica by running
> the recovery protocol to find current begin and end positions of the log
> and catching-up positions that are missing on the replica. This allows
> following replicas to serve eventually consistent reads.
> 
> 
> Diffs
> -----
> 
>   src/log/recover.hpp 94d6bb283ea82df607fcc72de05cda3b12ad5d2d 
>   src/log/recover.cpp b366818767adb259ee6805f9978fd330b790a264 
>   src/tests/log_tests.cpp f9f9400c901152779ae0ebfe74cf8f7aac1d3396 
> 
> 
> Diff: https://reviews.apache.org/r/62286/diff/1/
> 
> 
> Testing
> -------
> 
> Added tests that verify that the new recovery process correctly performs and produces meaningful result under various circumstances (recovered positions were truncated, replica was lagging far behind). Ran `make check`.
> 
> 
> Thanks,
> 
> Ilya Pronin
> 
>


Re: Review Request 62286: Added CatchupMissing log process.

Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62286/#review194370
-----------------------------------------------------------


Ship it!




Ship It!

- Jie Yu


On Oct. 3, 2017, 11:16 p.m., Ilya Pronin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62286/
> -----------------------------------------------------------
> 
> (Updated Oct. 3, 2017, 11:16 p.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Bugs: MESOS-7973
>     https://issues.apache.org/jira/browse/MESOS-7973
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The process is used to catch-up a non-leading VOTING replica by running
> the recovery protocol to find current begin and end positions of the log
> and catching-up positions that are missing on the replica. This allows
> following replicas to serve eventually consistent reads.
> 
> 
> Diffs
> -----
> 
>   src/log/catchup.hpp 123bc7a57e5e89f9ba75c36ba0cbe5ead807c518 
>   src/log/catchup.cpp 94e1b00db2cd9d5a2368a979c1fd155bb6cac1f2 
>   src/tests/log_tests.cpp f9f9400c901152779ae0ebfe74cf8f7aac1d3396 
> 
> 
> Diff: https://reviews.apache.org/r/62286/diff/2/
> 
> 
> Testing
> -------
> 
> Added tests that verify that the new recovery process correctly performs and produces meaningful result under various circumstances (recovered positions were truncated, replica was lagging far behind). Ran `make check`.
> 
> 
> Thanks,
> 
> Ilya Pronin
> 
>


Re: Review Request 62286: Added CatchupMissing log process.

Posted by Ilya Pronin <ip...@twopensource.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62286/
-----------------------------------------------------------

(Updated Oct. 4, 2017, 12:16 a.m.)


Review request for mesos and Jie Yu.


Changes
-------

Addressed review comments.


Summary (updated)
-----------------

Added CatchupMissing log process.


Bugs: MESOS-7973
    https://issues.apache.org/jira/browse/MESOS-7973


Repository: mesos


Description (updated)
-------

The process is used to catch-up a non-leading VOTING replica by running
the recovery protocol to find current begin and end positions of the log
and catching-up positions that are missing on the replica. This allows
following replicas to serve eventually consistent reads.


Diffs (updated)
-----

  src/log/catchup.hpp 123bc7a57e5e89f9ba75c36ba0cbe5ead807c518 
  src/log/catchup.cpp 94e1b00db2cd9d5a2368a979c1fd155bb6cac1f2 
  src/tests/log_tests.cpp f9f9400c901152779ae0ebfe74cf8f7aac1d3396 


Diff: https://reviews.apache.org/r/62286/diff/2/

Changes: https://reviews.apache.org/r/62286/diff/1-2/


Testing
-------

Added tests that verify that the new recovery process correctly performs and produces meaningful result under various circumstances (recovered positions were truncated, replica was lagging far behind). Ran `make check`.


Thanks,

Ilya Pronin