You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Jie Yu <yu...@gmail.com> on 2013/11/01 01:18:46 UTC

Re: Review Request 14631: Catch-up Replicated Log 1: decoupled coordinator logics and made them asynchronous.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14631/
-----------------------------------------------------------

(Updated Nov. 1, 2013, 12:18 a.m.)


Review request for mesos and Benjamin Hindman.


Changes
-------

Using the new Shared::upgrade().

Tests:
bin/mesos-tests.sh --gtest_filter=LogTest.*:ReplicaTest.*:CoordinatorTest.* --gtest_repeat=100


Bugs: MESOS-736
    https://issues.apache.org/jira/browse/MESOS-736


Repository: mesos-git


Description
-------

This is the first patch of a series of patches that implement a catch-up mechanism for replicated log. See the following ticket for more details:
https://issues.apache.org/jira/browse/MESOS-736

Here is a brief summary of this patch: (Sorry for the fact that we are not able to break it into smaller patches :()

1) Pulled the original Coordinator logic out and divides it into several Paxos phases (see src/log/consensus.hpp). Instead of using a blocking semantics, we implemented all the logics asynchronously.

2) In order to ensure the liveness of a catch-uper, we implemented a retry logic by bumping the proposal number. This also requires us to slightly change the existing replica protocol.

3) Made the "fill" operation independent of the underlying replica. Instead, introduced a catchup (see src/log/catchup.hpp) function to make sure the underlying local replica has learned each write.

4) Modified the log tests to adapt to the new semantics (see (3) above)

This is a joint work with Yan Xu.


Diffs (updated)
-----

  src/Makefile.am a11c76b 
  src/log/catchup.hpp PRE-CREATION 
  src/log/catchup.cpp PRE-CREATION 
  src/log/consensus.hpp PRE-CREATION 
  src/log/consensus.cpp PRE-CREATION 
  src/log/coordinator.hpp 3f6fb7c 
  src/log/coordinator.cpp 6e6466f 
  src/log/log.hpp 77edc7a 
  src/log/network.hpp d34cf78 
  src/log/replica.hpp d1f5ead 
  src/log/replica.cpp 59a6ff3 
  src/messages/log.proto 3d5859f 
  src/tests/log_tests.cpp ff5f86c 

Diff: https://reviews.apache.org/r/14631/diff/


Testing
-------

bin/mesos-tests.sh --gtest_filter=*CoordinatorTest*:*LogTest*:*ReplicaTest* --gtest_repeat=100


Thanks,

Jie Yu


Re: Review Request 14631: Catch-up Replicated Log 1: decoupled coordinator logics and made them asynchronous.

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14631/#review28258
-----------------------------------------------------------



src/log/catchup.hpp
<https://reviews.apache.org/r/14631/#comment55002>

    Let's describe the parameters in a bit more detail. In particular, I think that 'proposal' needs some explaining (i.e., why couldn't catchup figure out a proposal number to use itself?).



src/log/catchup.cpp
<https://reviews.apache.org/r/14631/#comment55001>

    Kill "We use this function to make sure that a log position is learned locally.".



src/tests/log_tests.cpp
<https://reviews.apache.org/r/14631/#comment56105>

    How about we just terminate 'replica2' instead? That is actually more in line with what we're trying to test too!
    
    Maybe something like:
    
    process::terminate(replica2->pid());
    process::wait(replica2->pid());
    replica2.reset();


- Benjamin Hindman


On Jan. 16, 2014, 6:37 a.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14631/
> -----------------------------------------------------------
> 
> (Updated Jan. 16, 2014, 6:37 a.m.)
> 
> 
> Review request for mesos and Benjamin Hindman.
> 
> 
> Bugs: MESOS-736
>     https://issues.apache.org/jira/browse/MESOS-736
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> This is the first patch of a series of patches that implement a catch-up mechanism for replicated log. See the following ticket for more details:
> https://issues.apache.org/jira/browse/MESOS-736
> 
> Here is a brief summary of this patch: (Sorry for the fact that we are not able to break it into smaller patches :()
> 
> 1) Pulled the original Coordinator logic out and divides it into several Paxos phases (see src/log/consensus.hpp). Instead of using a blocking semantics, we implemented all the logics asynchronously.
> 
> 2) In order to ensure the liveness of a catch-uper, we implemented a retry logic by bumping the proposal number. This also requires us to slightly change the existing replica protocol.
> 
> 3) Made the "fill" operation independent of the underlying replica. Instead, introduced a catchup (see src/log/catchup.hpp) function to make sure the underlying local replica has learned each write.
> 
> 4) Modified the log tests to adapt to the new semantics (see (3) above)
> 
> This is a joint work with Yan Xu.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am cf0c8c6 
>   src/log/catchup.hpp PRE-CREATION 
>   src/log/catchup.cpp PRE-CREATION 
>   src/log/consensus.hpp PRE-CREATION 
>   src/log/consensus.cpp PRE-CREATION 
>   src/log/coordinator.hpp 3f6fb7c 
>   src/log/coordinator.cpp 6e6466f 
>   src/log/log.hpp 77edc7a 
>   src/log/network.hpp d34cf78 
>   src/log/replica.hpp d1f5ead 
>   src/log/replica.cpp 82c2157 
>   src/messages/log.proto e6460ab 
>   src/tests/log_tests.cpp ff5f86c 
> 
> Diff: https://reviews.apache.org/r/14631/diff/
> 
> 
> Testing
> -------
> 
> bin/mesos-tests.sh --gtest_filter=*CoordinatorTest*:*LogTest*:*ReplicaTest* --gtest_repeat=100
> 
> 
> Thanks,
> 
> Jie Yu
> 
>


Re: Review Request 14631: Catch-up Replicated Log 1: decoupled coordinator logics and made them asynchronous.

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14631/#review32102
-----------------------------------------------------------

Ship it!


Ship It!

- Benjamin Hindman


On Jan. 16, 2014, 11:37 p.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14631/
> -----------------------------------------------------------
> 
> (Updated Jan. 16, 2014, 11:37 p.m.)
> 
> 
> Review request for mesos and Benjamin Hindman.
> 
> 
> Bugs: MESOS-736
>     https://issues.apache.org/jira/browse/MESOS-736
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> This is the first patch of a series of patches that implement a catch-up mechanism for replicated log. See the following ticket for more details:
> https://issues.apache.org/jira/browse/MESOS-736
> 
> Here is a brief summary of this patch: (Sorry for the fact that we are not able to break it into smaller patches :()
> 
> 1) Pulled the original Coordinator logic out and divides it into several Paxos phases (see src/log/consensus.hpp). Instead of using a blocking semantics, we implemented all the logics asynchronously.
> 
> 2) In order to ensure the liveness of a catch-uper, we implemented a retry logic by bumping the proposal number. This also requires us to slightly change the existing replica protocol.
> 
> 3) Made the "fill" operation independent of the underlying replica. Instead, introduced a catchup (see src/log/catchup.hpp) function to make sure the underlying local replica has learned each write.
> 
> 4) Modified the log tests to adapt to the new semantics (see (3) above)
> 
> This is a joint work with Yan Xu.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am cf0c8c6 
>   src/log/catchup.hpp PRE-CREATION 
>   src/log/catchup.cpp PRE-CREATION 
>   src/log/consensus.hpp PRE-CREATION 
>   src/log/consensus.cpp PRE-CREATION 
>   src/log/coordinator.hpp 3f6fb7c 
>   src/log/coordinator.cpp 6e6466f 
>   src/log/log.hpp 77edc7a 
>   src/log/network.hpp d34cf78 
>   src/log/replica.hpp d1f5ead 
>   src/log/replica.cpp 82c2157 
>   src/messages/log.proto e6460ab 
>   src/tests/log_tests.cpp ff5f86c 
> 
> Diff: https://reviews.apache.org/r/14631/diff/
> 
> 
> Testing
> -------
> 
> bin/mesos-tests.sh --gtest_filter=*CoordinatorTest*:*LogTest*:*ReplicaTest* --gtest_repeat=100
> 
> 
> Thanks,
> 
> Jie Yu
> 
>


Re: Review Request 14631: Catch-up Replicated Log 1: decoupled coordinator logics and made them asynchronous.

Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14631/
-----------------------------------------------------------

(Updated Jan. 16, 2014, 11:37 p.m.)


Review request for mesos and Benjamin Hindman.


Changes
-------

Rebased. BenH's comment.


Bugs: MESOS-736
    https://issues.apache.org/jira/browse/MESOS-736


Repository: mesos-git


Description
-------

This is the first patch of a series of patches that implement a catch-up mechanism for replicated log. See the following ticket for more details:
https://issues.apache.org/jira/browse/MESOS-736

Here is a brief summary of this patch: (Sorry for the fact that we are not able to break it into smaller patches :()

1) Pulled the original Coordinator logic out and divides it into several Paxos phases (see src/log/consensus.hpp). Instead of using a blocking semantics, we implemented all the logics asynchronously.

2) In order to ensure the liveness of a catch-uper, we implemented a retry logic by bumping the proposal number. This also requires us to slightly change the existing replica protocol.

3) Made the "fill" operation independent of the underlying replica. Instead, introduced a catchup (see src/log/catchup.hpp) function to make sure the underlying local replica has learned each write.

4) Modified the log tests to adapt to the new semantics (see (3) above)

This is a joint work with Yan Xu.


Diffs (updated)
-----

  src/Makefile.am cf0c8c6 
  src/log/catchup.hpp PRE-CREATION 
  src/log/catchup.cpp PRE-CREATION 
  src/log/consensus.hpp PRE-CREATION 
  src/log/consensus.cpp PRE-CREATION 
  src/log/coordinator.hpp 3f6fb7c 
  src/log/coordinator.cpp 6e6466f 
  src/log/log.hpp 77edc7a 
  src/log/network.hpp d34cf78 
  src/log/replica.hpp d1f5ead 
  src/log/replica.cpp 82c2157 
  src/messages/log.proto e6460ab 
  src/tests/log_tests.cpp ff5f86c 

Diff: https://reviews.apache.org/r/14631/diff/


Testing
-------

bin/mesos-tests.sh --gtest_filter=*CoordinatorTest*:*LogTest*:*ReplicaTest* --gtest_repeat=100


Thanks,

Jie Yu


Re: Review Request 14631: Catch-up Replicated Log 1: decoupled coordinator logics and made them asynchronous.

Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14631/
-----------------------------------------------------------

(Updated Jan. 16, 2014, 6:37 a.m.)


Review request for mesos and Benjamin Hindman.


Changes
-------

Rebase.


Bugs: MESOS-736
    https://issues.apache.org/jira/browse/MESOS-736


Repository: mesos-git


Description
-------

This is the first patch of a series of patches that implement a catch-up mechanism for replicated log. See the following ticket for more details:
https://issues.apache.org/jira/browse/MESOS-736

Here is a brief summary of this patch: (Sorry for the fact that we are not able to break it into smaller patches :()

1) Pulled the original Coordinator logic out and divides it into several Paxos phases (see src/log/consensus.hpp). Instead of using a blocking semantics, we implemented all the logics asynchronously.

2) In order to ensure the liveness of a catch-uper, we implemented a retry logic by bumping the proposal number. This also requires us to slightly change the existing replica protocol.

3) Made the "fill" operation independent of the underlying replica. Instead, introduced a catchup (see src/log/catchup.hpp) function to make sure the underlying local replica has learned each write.

4) Modified the log tests to adapt to the new semantics (see (3) above)

This is a joint work with Yan Xu.


Diffs (updated)
-----

  src/Makefile.am cf0c8c6 
  src/log/catchup.hpp PRE-CREATION 
  src/log/catchup.cpp PRE-CREATION 
  src/log/consensus.hpp PRE-CREATION 
  src/log/consensus.cpp PRE-CREATION 
  src/log/coordinator.hpp 3f6fb7c 
  src/log/coordinator.cpp 6e6466f 
  src/log/log.hpp 77edc7a 
  src/log/network.hpp d34cf78 
  src/log/replica.hpp d1f5ead 
  src/log/replica.cpp 82c2157 
  src/messages/log.proto e6460ab 
  src/tests/log_tests.cpp ff5f86c 

Diff: https://reviews.apache.org/r/14631/diff/


Testing
-------

bin/mesos-tests.sh --gtest_filter=*CoordinatorTest*:*LogTest*:*ReplicaTest* --gtest_repeat=100


Thanks,

Jie Yu


Re: Review Request 14631: Catch-up Replicated Log 1: decoupled coordinator logics and made them asynchronous.

Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14631/
-----------------------------------------------------------

(Updated Nov. 25, 2013, 5:52 a.m.)


Review request for mesos and Benjamin Hindman.


Changes
-------

Rebased on (Log 0: https://reviews.apache.org/r/15799/).


Bugs: MESOS-736
    https://issues.apache.org/jira/browse/MESOS-736


Repository: mesos-git


Description
-------

This is the first patch of a series of patches that implement a catch-up mechanism for replicated log. See the following ticket for more details:
https://issues.apache.org/jira/browse/MESOS-736

Here is a brief summary of this patch: (Sorry for the fact that we are not able to break it into smaller patches :()

1) Pulled the original Coordinator logic out and divides it into several Paxos phases (see src/log/consensus.hpp). Instead of using a blocking semantics, we implemented all the logics asynchronously.

2) In order to ensure the liveness of a catch-uper, we implemented a retry logic by bumping the proposal number. This also requires us to slightly change the existing replica protocol.

3) Made the "fill" operation independent of the underlying replica. Instead, introduced a catchup (see src/log/catchup.hpp) function to make sure the underlying local replica has learned each write.

4) Modified the log tests to adapt to the new semantics (see (3) above)

This is a joint work with Yan Xu.


Diffs (updated)
-----

  src/Makefile.am feda34b 
  src/log/catchup.hpp PRE-CREATION 
  src/log/catchup.cpp PRE-CREATION 
  src/log/consensus.hpp PRE-CREATION 
  src/log/consensus.cpp PRE-CREATION 
  src/log/coordinator.hpp 3f6fb7c 
  src/log/coordinator.cpp 6e6466f 
  src/log/log.hpp 77edc7a 
  src/log/network.hpp d34cf78 
  src/log/replica.hpp d1f5ead 
  src/log/replica.cpp 59a6ff3 
  src/messages/log.proto 3d5859f 
  src/tests/log_tests.cpp ff5f86c 

Diff: https://reviews.apache.org/r/14631/diff/


Testing
-------

bin/mesos-tests.sh --gtest_filter=*CoordinatorTest*:*LogTest*:*ReplicaTest* --gtest_repeat=100


Thanks,

Jie Yu


Re: Review Request 14631: Catch-up Replicated Log 1: decoupled coordinator logics and made them asynchronous.

Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14631/
-----------------------------------------------------------

(Updated Nov. 23, 2013, 12:49 a.m.)


Review request for mesos and Benjamin Hindman.


Changes
-------

Addressed BenM's comments.


Bugs: MESOS-736
    https://issues.apache.org/jira/browse/MESOS-736


Repository: mesos-git


Description
-------

This is the first patch of a series of patches that implement a catch-up mechanism for replicated log. See the following ticket for more details:
https://issues.apache.org/jira/browse/MESOS-736

Here is a brief summary of this patch: (Sorry for the fact that we are not able to break it into smaller patches :()

1) Pulled the original Coordinator logic out and divides it into several Paxos phases (see src/log/consensus.hpp). Instead of using a blocking semantics, we implemented all the logics asynchronously.

2) In order to ensure the liveness of a catch-uper, we implemented a retry logic by bumping the proposal number. This also requires us to slightly change the existing replica protocol.

3) Made the "fill" operation independent of the underlying replica. Instead, introduced a catchup (see src/log/catchup.hpp) function to make sure the underlying local replica has learned each write.

4) Modified the log tests to adapt to the new semantics (see (3) above)

This is a joint work with Yan Xu.


Diffs (updated)
-----

  src/Makefile.am feda34b 
  src/log/catchup.hpp PRE-CREATION 
  src/log/catchup.cpp PRE-CREATION 
  src/log/consensus.hpp PRE-CREATION 
  src/log/consensus.cpp PRE-CREATION 
  src/log/coordinator.hpp 3f6fb7c 
  src/log/coordinator.cpp 6e6466f 
  src/log/log.hpp 77edc7a 
  src/log/network.hpp d34cf78 
  src/log/replica.hpp d1f5ead 
  src/log/replica.cpp 59a6ff3 
  src/messages/log.proto 3d5859f 
  src/tests/log_tests.cpp ff5f86c 

Diff: https://reviews.apache.org/r/14631/diff/


Testing
-------

bin/mesos-tests.sh --gtest_filter=*CoordinatorTest*:*LogTest*:*ReplicaTest* --gtest_repeat=100


Thanks,

Jie Yu


Re: Review Request 14631: Catch-up Replicated Log 1: decoupled coordinator logics and made them asynchronous.

Posted by Jie Yu <yu...@gmail.com>.

> On Nov. 15, 2013, 3:06 a.m., Ben Mahler wrote:
> > src/log/coordinator.cpp, lines 500-503
> > <https://reviews.apache.org/r/14631/diff/14/?file=377409#file377409line500>
> >
> >     Are we writing the same data? Technically, aren't we changing the learned bit of the Action when we write it after log::learn?

Yes, we are writing the same data.


- Jie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14631/#review28918
-----------------------------------------------------------


On Nov. 13, 2013, 8:41 p.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14631/
> -----------------------------------------------------------
> 
> (Updated Nov. 13, 2013, 8:41 p.m.)
> 
> 
> Review request for mesos and Benjamin Hindman.
> 
> 
> Bugs: MESOS-736
>     https://issues.apache.org/jira/browse/MESOS-736
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> This is the first patch of a series of patches that implement a catch-up mechanism for replicated log. See the following ticket for more details:
> https://issues.apache.org/jira/browse/MESOS-736
> 
> Here is a brief summary of this patch: (Sorry for the fact that we are not able to break it into smaller patches :()
> 
> 1) Pulled the original Coordinator logic out and divides it into several Paxos phases (see src/log/consensus.hpp). Instead of using a blocking semantics, we implemented all the logics asynchronously.
> 
> 2) In order to ensure the liveness of a catch-uper, we implemented a retry logic by bumping the proposal number. This also requires us to slightly change the existing replica protocol.
> 
> 3) Made the "fill" operation independent of the underlying replica. Instead, introduced a catchup (see src/log/catchup.hpp) function to make sure the underlying local replica has learned each write.
> 
> 4) Modified the log tests to adapt to the new semantics (see (3) above)
> 
> This is a joint work with Yan Xu.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am 9780d07 
>   src/log/catchup.hpp PRE-CREATION 
>   src/log/catchup.cpp PRE-CREATION 
>   src/log/consensus.hpp PRE-CREATION 
>   src/log/consensus.cpp PRE-CREATION 
>   src/log/coordinator.hpp 3f6fb7c 
>   src/log/coordinator.cpp 6e6466f 
>   src/log/log.hpp 77edc7a 
>   src/log/network.hpp d34cf78 
>   src/log/replica.hpp d1f5ead 
>   src/log/replica.cpp 59a6ff3 
>   src/messages/log.proto 3d5859f 
>   src/tests/log_tests.cpp ff5f86c 
> 
> Diff: https://reviews.apache.org/r/14631/diff/
> 
> 
> Testing
> -------
> 
> bin/mesos-tests.sh --gtest_filter=*CoordinatorTest*:*LogTest*:*ReplicaTest* --gtest_repeat=100
> 
> 
> Thanks,
> 
> Jie Yu
> 
>


Re: Review Request 14631: Catch-up Replicated Log 1: decoupled coordinator logics and made them asynchronous.

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14631/#review28918
-----------------------------------------------------------


This looks good, great cleanup! :)

Here's how I would have rebased and pulled apart this change into a few logical changes:
  1. Updated Network::Broadcast to return a future.
  2. Renamed id/coordinator to proposal in log.proto and a few of the C++ files.
  3. Added a new missing() call on Replica along with the added const correctness.
  4. The meat of the change :)

Although it may seem minor, I think review time/difficulty increases quadratically or more with the size of a diff (especially when Paxos is involved ;)), so every bit helps! Anyway, good stuff!


src/log/catchup.cpp
<https://reviews.apache.org/r/14631/#comment55983>

    4 space indent



src/log/catchup.cpp
<https://reviews.apache.org/r/14631/#comment55985>

    4 space indent



src/log/catchup.cpp
<https://reviews.apache.org/r/14631/#comment55987>

    Can you add the position number?
    
    "Failed to catch-up position " + stringify(*it) + ": " + catching.failure()



src/log/catchup.cpp
<https://reviews.apache.org/r/14631/#comment55986>

    Can you elaborate on this comment? It's not clear to me where the unnecessary proposal number bumps will originate and why this prevents it



src/log/consensus.hpp
<https://reviews.apache.org/r/14631/#comment55995>

    s/gain/gained/



src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment55990>

    indent with 4 spaces



src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment55991>

    Should this be CHECK_EQ? CHECK_EQ will print the actual vs expected when it fails



src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment55992>

    CHECK_EQ?



src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment55993>

    CHECK_EQ?



src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment55994>

    4 spaces



src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment55996>

    4 spaces



src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment55997>

    LOG(FATAL) << "Unknown Action::Type " << action.type();



src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment55998>

    CHECK_EQ



src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment55999>

    4 spaces



src/log/coordinator.hpp
<https://reviews.apache.org/r/14631/#comment55988>

    Where did the documentation go?



src/log/coordinator.cpp
<https://reviews.apache.org/r/14631/#comment56000>

    Are we writing the same data? Technically, aren't we changing the learned bit of the Action when we write it after log::learn?


- Ben Mahler


On Nov. 13, 2013, 8:41 p.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14631/
> -----------------------------------------------------------
> 
> (Updated Nov. 13, 2013, 8:41 p.m.)
> 
> 
> Review request for mesos and Benjamin Hindman.
> 
> 
> Bugs: MESOS-736
>     https://issues.apache.org/jira/browse/MESOS-736
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> This is the first patch of a series of patches that implement a catch-up mechanism for replicated log. See the following ticket for more details:
> https://issues.apache.org/jira/browse/MESOS-736
> 
> Here is a brief summary of this patch: (Sorry for the fact that we are not able to break it into smaller patches :()
> 
> 1) Pulled the original Coordinator logic out and divides it into several Paxos phases (see src/log/consensus.hpp). Instead of using a blocking semantics, we implemented all the logics asynchronously.
> 
> 2) In order to ensure the liveness of a catch-uper, we implemented a retry logic by bumping the proposal number. This also requires us to slightly change the existing replica protocol.
> 
> 3) Made the "fill" operation independent of the underlying replica. Instead, introduced a catchup (see src/log/catchup.hpp) function to make sure the underlying local replica has learned each write.
> 
> 4) Modified the log tests to adapt to the new semantics (see (3) above)
> 
> This is a joint work with Yan Xu.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am 9780d07 
>   src/log/catchup.hpp PRE-CREATION 
>   src/log/catchup.cpp PRE-CREATION 
>   src/log/consensus.hpp PRE-CREATION 
>   src/log/consensus.cpp PRE-CREATION 
>   src/log/coordinator.hpp 3f6fb7c 
>   src/log/coordinator.cpp 6e6466f 
>   src/log/log.hpp 77edc7a 
>   src/log/network.hpp d34cf78 
>   src/log/replica.hpp d1f5ead 
>   src/log/replica.cpp 59a6ff3 
>   src/messages/log.proto 3d5859f 
>   src/tests/log_tests.cpp ff5f86c 
> 
> Diff: https://reviews.apache.org/r/14631/diff/
> 
> 
> Testing
> -------
> 
> bin/mesos-tests.sh --gtest_filter=*CoordinatorTest*:*LogTest*:*ReplicaTest* --gtest_repeat=100
> 
> 
> Thanks,
> 
> Jie Yu
> 
>


Re: Review Request 14631: Catch-up Replicated Log 1: decoupled coordinator logics and made them asynchronous.

Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14631/
-----------------------------------------------------------

(Updated Nov. 13, 2013, 8:41 p.m.)


Review request for mesos and Benjamin Hindman.


Changes
-------

Updated "depends on".


Bugs: MESOS-736
    https://issues.apache.org/jira/browse/MESOS-736


Repository: mesos-git


Description
-------

This is the first patch of a series of patches that implement a catch-up mechanism for replicated log. See the following ticket for more details:
https://issues.apache.org/jira/browse/MESOS-736

Here is a brief summary of this patch: (Sorry for the fact that we are not able to break it into smaller patches :()

1) Pulled the original Coordinator logic out and divides it into several Paxos phases (see src/log/consensus.hpp). Instead of using a blocking semantics, we implemented all the logics asynchronously.

2) In order to ensure the liveness of a catch-uper, we implemented a retry logic by bumping the proposal number. This also requires us to slightly change the existing replica protocol.

3) Made the "fill" operation independent of the underlying replica. Instead, introduced a catchup (see src/log/catchup.hpp) function to make sure the underlying local replica has learned each write.

4) Modified the log tests to adapt to the new semantics (see (3) above)

This is a joint work with Yan Xu.


Diffs
-----

  src/Makefile.am 9780d07 
  src/log/catchup.hpp PRE-CREATION 
  src/log/catchup.cpp PRE-CREATION 
  src/log/consensus.hpp PRE-CREATION 
  src/log/consensus.cpp PRE-CREATION 
  src/log/coordinator.hpp 3f6fb7c 
  src/log/coordinator.cpp 6e6466f 
  src/log/log.hpp 77edc7a 
  src/log/network.hpp d34cf78 
  src/log/replica.hpp d1f5ead 
  src/log/replica.cpp 59a6ff3 
  src/messages/log.proto 3d5859f 
  src/tests/log_tests.cpp ff5f86c 

Diff: https://reviews.apache.org/r/14631/diff/


Testing
-------

bin/mesos-tests.sh --gtest_filter=*CoordinatorTest*:*LogTest*:*ReplicaTest* --gtest_repeat=100


Thanks,

Jie Yu


Re: Review Request 14631: Catch-up Replicated Log 1: decoupled coordinator logics and made them asynchronous.

Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14631/
-----------------------------------------------------------

(Updated Nov. 5, 2013, 12:02 a.m.)


Review request for mesos and Benjamin Hindman.


Changes
-------

Addressed BenH's comments.

Tests:
bin/mesos-tests.sh --gtest_filter=CoordinatorTest.*:ReplicaTest.*:LogTest.* --gtest_repeat=100


Bugs: MESOS-736
    https://issues.apache.org/jira/browse/MESOS-736


Repository: mesos-git


Description
-------

This is the first patch of a series of patches that implement a catch-up mechanism for replicated log. See the following ticket for more details:
https://issues.apache.org/jira/browse/MESOS-736

Here is a brief summary of this patch: (Sorry for the fact that we are not able to break it into smaller patches :()

1) Pulled the original Coordinator logic out and divides it into several Paxos phases (see src/log/consensus.hpp). Instead of using a blocking semantics, we implemented all the logics asynchronously.

2) In order to ensure the liveness of a catch-uper, we implemented a retry logic by bumping the proposal number. This also requires us to slightly change the existing replica protocol.

3) Made the "fill" operation independent of the underlying replica. Instead, introduced a catchup (see src/log/catchup.hpp) function to make sure the underlying local replica has learned each write.

4) Modified the log tests to adapt to the new semantics (see (3) above)

This is a joint work with Yan Xu.


Diffs (updated)
-----

  src/Makefile.am 9780d07 
  src/log/catchup.hpp PRE-CREATION 
  src/log/catchup.cpp PRE-CREATION 
  src/log/consensus.hpp PRE-CREATION 
  src/log/consensus.cpp PRE-CREATION 
  src/log/coordinator.hpp 3f6fb7c 
  src/log/coordinator.cpp 6e6466f 
  src/log/log.hpp 77edc7a 
  src/log/network.hpp d34cf78 
  src/log/replica.hpp d1f5ead 
  src/log/replica.cpp 59a6ff3 
  src/messages/log.proto 3d5859f 
  src/tests/log_tests.cpp ff5f86c 

Diff: https://reviews.apache.org/r/14631/diff/


Testing
-------

bin/mesos-tests.sh --gtest_filter=*CoordinatorTest*:*LogTest*:*ReplicaTest* --gtest_repeat=100


Thanks,

Jie Yu


Re: Review Request 14631: Catch-up Replicated Log 1: decoupled coordinator logics and made them asynchronous.

Posted by Jie Yu <yu...@gmail.com>.

> On Nov. 4, 2013, 8:45 p.m., Benjamin Hindman wrote:
> > It would be great to include some documentation that lists all possible proposers for the replicas.

Done. Added a paragraph in src/log/consensus.hpp.


- Jie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14631/#review27506
-----------------------------------------------------------


On Nov. 1, 2013, 12:18 a.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14631/
> -----------------------------------------------------------
> 
> (Updated Nov. 1, 2013, 12:18 a.m.)
> 
> 
> Review request for mesos and Benjamin Hindman.
> 
> 
> Bugs: MESOS-736
>     https://issues.apache.org/jira/browse/MESOS-736
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> This is the first patch of a series of patches that implement a catch-up mechanism for replicated log. See the following ticket for more details:
> https://issues.apache.org/jira/browse/MESOS-736
> 
> Here is a brief summary of this patch: (Sorry for the fact that we are not able to break it into smaller patches :()
> 
> 1) Pulled the original Coordinator logic out and divides it into several Paxos phases (see src/log/consensus.hpp). Instead of using a blocking semantics, we implemented all the logics asynchronously.
> 
> 2) In order to ensure the liveness of a catch-uper, we implemented a retry logic by bumping the proposal number. This also requires us to slightly change the existing replica protocol.
> 
> 3) Made the "fill" operation independent of the underlying replica. Instead, introduced a catchup (see src/log/catchup.hpp) function to make sure the underlying local replica has learned each write.
> 
> 4) Modified the log tests to adapt to the new semantics (see (3) above)
> 
> This is a joint work with Yan Xu.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am a11c76b 
>   src/log/catchup.hpp PRE-CREATION 
>   src/log/catchup.cpp PRE-CREATION 
>   src/log/consensus.hpp PRE-CREATION 
>   src/log/consensus.cpp PRE-CREATION 
>   src/log/coordinator.hpp 3f6fb7c 
>   src/log/coordinator.cpp 6e6466f 
>   src/log/log.hpp 77edc7a 
>   src/log/network.hpp d34cf78 
>   src/log/replica.hpp d1f5ead 
>   src/log/replica.cpp 59a6ff3 
>   src/messages/log.proto 3d5859f 
>   src/tests/log_tests.cpp ff5f86c 
> 
> Diff: https://reviews.apache.org/r/14631/diff/
> 
> 
> Testing
> -------
> 
> bin/mesos-tests.sh --gtest_filter=*CoordinatorTest*:*LogTest*:*ReplicaTest* --gtest_repeat=100
> 
> 
> Thanks,
> 
> Jie Yu
> 
>


Re: Review Request 14631: Catch-up Replicated Log 1: decoupled coordinator logics and made them asynchronous.

Posted by Jie Yu <yu...@gmail.com>.
Ben, FYI, your last review is not for the latest version.

I addressed all your comments accordingly in the newest version.

- Jie


On Mon, Nov 4, 2013 at 12:45 PM, Benjamin Hindman <be...@berkeley.edu> wrote:

>    This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14631/
>
> It would be great to include some documentation that lists all possible proposers for the replicas.
>
>
>    src/log/log.hpp<https://reviews.apache.org/r/14631/diff/6/?file=370346#file370346line190> (Diff
> revision 6)
>
> public:
>
>   186
>
>     quorum = _quorum;
>
> 190
>
>     replica.reset(new Replica(path));
>
>   Any reason not to do this in the initializer list above?
>
>
>    src/log/log.hpp<https://reviews.apache.org/r/14631/diff/6/?file=370346#file370346line222> (Diff
> revision 6)
>
> public:
>
>   213
>
>     replica = new Replica(path);
>
> 217
>
>     replica.reset(new Replica(path));
>
>   214
>
> 218
>
>   215
>
>     group = new zookeeper::Group(servers, timeout, znode, auth);
>
> 219
>
>     network.reset(new ZooKeeperNetwork(servers, timeout, znode, auth));
>
>   Any reason not to do these in the initializer list above? I.e.,
>
> group(new zookeeper::Group(servers, timeout, znode, auth)),
> quorum(_quorum),
> replica(new Replica(path)),
> network(new ZooKeeperNetwork(servers, timeout, znode, auth))
> {
>
>
>    src/log/log.hpp<https://reviews.apache.org/r/14631/diff/6/?file=370346#file370346line283> (Diff
> revision 6)
>
> public:
>
>    276
>
>   // For renewing membership.
>
>   264
>
>   zookeeper::Group* group;
>
> 277
>
>   zookeeper::Group* group;
>
>   How about a bit more context? I.e., "We store a Group instance in order to continually renew the replicas membership (when using ZooKeeper)."
>
>
>    src/log/network.hpp<https://reviews.apache.org/r/14631/diff/6/?file=370347#file370347line258> (Diff
> revision 6)
>
> inline void Network::set(const std::set<process::UPID>& pids)
>
>    258
>
>   group = new zookeeper::Group(servers, timeout, znode, auth);
>
>   Does it need to be a pointer? Also, can ZooKeeperNetwork be copied? Do we want it to be copyable?
>
>
>    src/log/replica.hpp<https://reviews.apache.org/r/14631/diff/6/?file=370348#file370348line63> (Diff
> revision 6)
>
> public:
>
>    60
>
>       uint64_t from, uint64_t to) const;
>
>   Let's put each parameter on it's own line.
>
>
>    src/log/replica.hpp<https://reviews.apache.org/r/14631/diff/6/?file=370348#file370348line72> (Diff
> revision 6)
>
> public:
>
>    69
>
>       uint64_t from, uint64_t to) const;
>
>   Each parameter on it's own line please.
>
>
>    src/log/replica.cpp<https://reviews.apache.org/r/14631/diff/6/?file=370349#file370349line540> (Diff
> revision 6)
>
> Try<Action> LevelDBStorage::read(uint64_t position)
>
>   536
>
>   // Last promise made to a coordinator.
>
> 534
>
>   // Last promise made to a coordinator.
>
>   s/coordinator/proposer/
>
>
>    src/messages/log.proto<https://reviews.apache.org/r/14631/diff/6/?file=370350#file370350line22> (Diff
> revision 6)    22
>
> // Represents a "promise" that a replica has made to a coordinator. A
>
> 22
>
> // Represents a "promise" that a replica has made to a coordinator. A
>
>   s/to a coordinator//
>
>
>    src/messages/log.proto<https://reviews.apache.org/r/14631/diff/6/?file=370350#file370350line25> (Diff
> revision 6)    25
>
> // same coordinator), until a new promise is made to a coordinator
>
> 25
>
> // same coordinator), until a new promise is made to a coordinator
>
>   s/coordinator/proposer/
>
>
> - Benjamin Hindman
>
> On November 1st, 2013, 12:18 a.m. UTC, Jie Yu wrote:
>   Review request for mesos and Benjamin Hindman.
> By Jie Yu.
>
> *Updated Nov. 1, 2013, 12:18 a.m.*
>  *Bugs: * MESOS-736 <https://issues.apache.org/jira/browse/MESOS-736>
>  *Repository: * mesos-git
> Description
>
> This is the first patch of a series of patches that implement a catch-up mechanism for replicated log. See the following ticket for more details:https://issues.apache.org/jira/browse/MESOS-736
>
> Here is a brief summary of this patch: (Sorry for the fact that we are not able to break it into smaller patches :()
>
> 1) Pulled the original Coordinator logic out and divides it into several Paxos phases (see src/log/consensus.hpp). Instead of using a blocking semantics, we implemented all the logics asynchronously.
>
> 2) In order to ensure the liveness of a catch-uper, we implemented a retry logic by bumping the proposal number. This also requires us to slightly change the existing replica protocol.
>
> 3) Made the "fill" operation independent of the underlying replica. Instead, introduced a catchup (see src/log/catchup.hpp) function to make sure the underlying local replica has learned each write.
>
> 4) Modified the log tests to adapt to the new semantics (see (3) above)
>
> This is a joint work with Yan Xu.
>
>   Testing
>
> bin/mesos-tests.sh --gtest_filter=*CoordinatorTest*:*LogTest*:*ReplicaTest* --gtest_repeat=100
>
>   Diffs
>
>    - src/Makefile.am (a11c76b)
>    - src/log/catchup.hpp (PRE-CREATION)
>    - src/log/catchup.cpp (PRE-CREATION)
>    - src/log/consensus.hpp (PRE-CREATION)
>    - src/log/consensus.cpp (PRE-CREATION)
>    - src/log/coordinator.hpp (3f6fb7c)
>    - src/log/coordinator.cpp (6e6466f)
>    - src/log/log.hpp (77edc7a)
>    - src/log/network.hpp (d34cf78)
>    - src/log/replica.hpp (d1f5ead)
>    - src/log/replica.cpp (59a6ff3)
>    - src/messages/log.proto (3d5859f)
>    - src/tests/log_tests.cpp (ff5f86c)
>
> View Diff <https://reviews.apache.org/r/14631/diff/>
>

Re: Review Request 14631: Catch-up Replicated Log 1: decoupled coordinator logics and made them asynchronous.

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14631/#review27506
-----------------------------------------------------------


It would be great to include some documentation that lists all possible proposers for the replicas.


src/log/log.hpp
<https://reviews.apache.org/r/14631/#comment53434>

    Any reason not to do this in the initializer list above?



src/log/log.hpp
<https://reviews.apache.org/r/14631/#comment53433>

    Any reason not to do these in the initializer list above? I.e.,
    
    group(new zookeeper::Group(servers, timeout, znode, auth)),
    quorum(_quorum),
    replica(new Replica(path)),
    network(new ZooKeeperNetwork(servers, timeout, znode, auth))
    {



src/log/log.hpp
<https://reviews.apache.org/r/14631/#comment53435>

    How about a bit more context? I.e., "We store a Group instance in order to continually renew the replicas membership (when using ZooKeeper)."



src/log/network.hpp
<https://reviews.apache.org/r/14631/#comment53436>

    Does it need to be a pointer? Also, can ZooKeeperNetwork be copied? Do we want it to be copyable?



src/log/replica.hpp
<https://reviews.apache.org/r/14631/#comment53437>

    Let's put each parameter on it's own line.



src/log/replica.hpp
<https://reviews.apache.org/r/14631/#comment53438>

    Each parameter on it's own line please.



src/log/replica.cpp
<https://reviews.apache.org/r/14631/#comment53441>

    s/coordinator/proposer/



src/messages/log.proto
<https://reviews.apache.org/r/14631/#comment53439>

    s/to a coordinator//



src/messages/log.proto
<https://reviews.apache.org/r/14631/#comment53440>

    s/coordinator/proposer/


- Benjamin Hindman


On Nov. 1, 2013, 12:18 a.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14631/
> -----------------------------------------------------------
> 
> (Updated Nov. 1, 2013, 12:18 a.m.)
> 
> 
> Review request for mesos and Benjamin Hindman.
> 
> 
> Bugs: MESOS-736
>     https://issues.apache.org/jira/browse/MESOS-736
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> This is the first patch of a series of patches that implement a catch-up mechanism for replicated log. See the following ticket for more details:
> https://issues.apache.org/jira/browse/MESOS-736
> 
> Here is a brief summary of this patch: (Sorry for the fact that we are not able to break it into smaller patches :()
> 
> 1) Pulled the original Coordinator logic out and divides it into several Paxos phases (see src/log/consensus.hpp). Instead of using a blocking semantics, we implemented all the logics asynchronously.
> 
> 2) In order to ensure the liveness of a catch-uper, we implemented a retry logic by bumping the proposal number. This also requires us to slightly change the existing replica protocol.
> 
> 3) Made the "fill" operation independent of the underlying replica. Instead, introduced a catchup (see src/log/catchup.hpp) function to make sure the underlying local replica has learned each write.
> 
> 4) Modified the log tests to adapt to the new semantics (see (3) above)
> 
> This is a joint work with Yan Xu.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am a11c76b 
>   src/log/catchup.hpp PRE-CREATION 
>   src/log/catchup.cpp PRE-CREATION 
>   src/log/consensus.hpp PRE-CREATION 
>   src/log/consensus.cpp PRE-CREATION 
>   src/log/coordinator.hpp 3f6fb7c 
>   src/log/coordinator.cpp 6e6466f 
>   src/log/log.hpp 77edc7a 
>   src/log/network.hpp d34cf78 
>   src/log/replica.hpp d1f5ead 
>   src/log/replica.cpp 59a6ff3 
>   src/messages/log.proto 3d5859f 
>   src/tests/log_tests.cpp ff5f86c 
> 
> Diff: https://reviews.apache.org/r/14631/diff/
> 
> 
> Testing
> -------
> 
> bin/mesos-tests.sh --gtest_filter=*CoordinatorTest*:*LogTest*:*ReplicaTest* --gtest_repeat=100
> 
> 
> Thanks,
> 
> Jie Yu
> 
>