You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Jie Yu <yu...@gmail.com> on 2013/11/01 01:18:46 UTC
Re: Review Request 14631: Catch-up Replicated Log 1: decoupled coordinator
logics and made them asynchronous.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14631/
-----------------------------------------------------------
(Updated Nov. 1, 2013, 12:18 a.m.)
Review request for mesos and Benjamin Hindman.
Changes
-------
Using the new Shared::upgrade().
Tests:
bin/mesos-tests.sh --gtest_filter=LogTest.*:ReplicaTest.*:CoordinatorTest.* --gtest_repeat=100
Bugs: MESOS-736
https://issues.apache.org/jira/browse/MESOS-736
Repository: mesos-git
Description
-------
This is the first patch of a series of patches that implement a catch-up mechanism for replicated log. See the following ticket for more details:
https://issues.apache.org/jira/browse/MESOS-736
Here is a brief summary of this patch: (Sorry for the fact that we are not able to break it into smaller patches :()
1) Pulled the original Coordinator logic out and divides it into several Paxos phases (see src/log/consensus.hpp). Instead of using a blocking semantics, we implemented all the logics asynchronously.
2) In order to ensure the liveness of a catch-uper, we implemented a retry logic by bumping the proposal number. This also requires us to slightly change the existing replica protocol.
3) Made the "fill" operation independent of the underlying replica. Instead, introduced a catchup (see src/log/catchup.hpp) function to make sure the underlying local replica has learned each write.
4) Modified the log tests to adapt to the new semantics (see (3) above)
This is a joint work with Yan Xu.
Diffs (updated)
-----
src/Makefile.am a11c76b
src/log/catchup.hpp PRE-CREATION
src/log/catchup.cpp PRE-CREATION
src/log/consensus.hpp PRE-CREATION
src/log/consensus.cpp PRE-CREATION
src/log/coordinator.hpp 3f6fb7c
src/log/coordinator.cpp 6e6466f
src/log/log.hpp 77edc7a
src/log/network.hpp d34cf78
src/log/replica.hpp d1f5ead
src/log/replica.cpp 59a6ff3
src/messages/log.proto 3d5859f
src/tests/log_tests.cpp ff5f86c
Diff: https://reviews.apache.org/r/14631/diff/
Testing
-------
bin/mesos-tests.sh --gtest_filter=*CoordinatorTest*:*LogTest*:*ReplicaTest* --gtest_repeat=100
Thanks,
Jie Yu
Re: Review Request 14631: Catch-up Replicated Log 1: decoupled coordinator
logics and made them asynchronous.
Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14631/#review28258
-----------------------------------------------------------
src/log/catchup.hpp
<https://reviews.apache.org/r/14631/#comment55002>
Let's describe the parameters in a bit more detail. In particular, I think that 'proposal' needs some explaining (i.e., why couldn't catchup figure out a proposal number to use itself?).
src/log/catchup.cpp
<https://reviews.apache.org/r/14631/#comment55001>
Kill "We use this function to make sure that a log position is learned locally.".
src/tests/log_tests.cpp
<https://reviews.apache.org/r/14631/#comment56105>
How about we just terminate 'replica2' instead? That is actually more in line with what we're trying to test too!
Maybe something like:
process::terminate(replica2->pid());
process::wait(replica2->pid());
replica2.reset();
- Benjamin Hindman
On Jan. 16, 2014, 6:37 a.m., Jie Yu wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14631/
> -----------------------------------------------------------
>
> (Updated Jan. 16, 2014, 6:37 a.m.)
>
>
> Review request for mesos and Benjamin Hindman.
>
>
> Bugs: MESOS-736
> https://issues.apache.org/jira/browse/MESOS-736
>
>
> Repository: mesos-git
>
>
> Description
> -------
>
> This is the first patch of a series of patches that implement a catch-up mechanism for replicated log. See the following ticket for more details:
> https://issues.apache.org/jira/browse/MESOS-736
>
> Here is a brief summary of this patch: (Sorry for the fact that we are not able to break it into smaller patches :()
>
> 1) Pulled the original Coordinator logic out and divides it into several Paxos phases (see src/log/consensus.hpp). Instead of using a blocking semantics, we implemented all the logics asynchronously.
>
> 2) In order to ensure the liveness of a catch-uper, we implemented a retry logic by bumping the proposal number. This also requires us to slightly change the existing replica protocol.
>
> 3) Made the "fill" operation independent of the underlying replica. Instead, introduced a catchup (see src/log/catchup.hpp) function to make sure the underlying local replica has learned each write.
>
> 4) Modified the log tests to adapt to the new semantics (see (3) above)
>
> This is a joint work with Yan Xu.
>
>
> Diffs
> -----
>
> src/Makefile.am cf0c8c6
> src/log/catchup.hpp PRE-CREATION
> src/log/catchup.cpp PRE-CREATION
> src/log/consensus.hpp PRE-CREATION
> src/log/consensus.cpp PRE-CREATION
> src/log/coordinator.hpp 3f6fb7c
> src/log/coordinator.cpp 6e6466f
> src/log/log.hpp 77edc7a
> src/log/network.hpp d34cf78
> src/log/replica.hpp d1f5ead
> src/log/replica.cpp 82c2157
> src/messages/log.proto e6460ab
> src/tests/log_tests.cpp ff5f86c
>
> Diff: https://reviews.apache.org/r/14631/diff/
>
>
> Testing
> -------
>
> bin/mesos-tests.sh --gtest_filter=*CoordinatorTest*:*LogTest*:*ReplicaTest* --gtest_repeat=100
>
>
> Thanks,
>
> Jie Yu
>
>
Re: Review Request 14631: Catch-up Replicated Log 1: decoupled coordinator
logics and made them asynchronous.
Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14631/#review32102
-----------------------------------------------------------
Ship it!
Ship It!
- Benjamin Hindman
On Jan. 16, 2014, 11:37 p.m., Jie Yu wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14631/
> -----------------------------------------------------------
>
> (Updated Jan. 16, 2014, 11:37 p.m.)
>
>
> Review request for mesos and Benjamin Hindman.
>
>
> Bugs: MESOS-736
> https://issues.apache.org/jira/browse/MESOS-736
>
>
> Repository: mesos-git
>
>
> Description
> -------
>
> This is the first patch of a series of patches that implement a catch-up mechanism for replicated log. See the following ticket for more details:
> https://issues.apache.org/jira/browse/MESOS-736
>
> Here is a brief summary of this patch: (Sorry for the fact that we are not able to break it into smaller patches :()
>
> 1) Pulled the original Coordinator logic out and divides it into several Paxos phases (see src/log/consensus.hpp). Instead of using a blocking semantics, we implemented all the logics asynchronously.
>
> 2) In order to ensure the liveness of a catch-uper, we implemented a retry logic by bumping the proposal number. This also requires us to slightly change the existing replica protocol.
>
> 3) Made the "fill" operation independent of the underlying replica. Instead, introduced a catchup (see src/log/catchup.hpp) function to make sure the underlying local replica has learned each write.
>
> 4) Modified the log tests to adapt to the new semantics (see (3) above)
>
> This is a joint work with Yan Xu.
>
>
> Diffs
> -----
>
> src/Makefile.am cf0c8c6
> src/log/catchup.hpp PRE-CREATION
> src/log/catchup.cpp PRE-CREATION
> src/log/consensus.hpp PRE-CREATION
> src/log/consensus.cpp PRE-CREATION
> src/log/coordinator.hpp 3f6fb7c
> src/log/coordinator.cpp 6e6466f
> src/log/log.hpp 77edc7a
> src/log/network.hpp d34cf78
> src/log/replica.hpp d1f5ead
> src/log/replica.cpp 82c2157
> src/messages/log.proto e6460ab
> src/tests/log_tests.cpp ff5f86c
>
> Diff: https://reviews.apache.org/r/14631/diff/
>
>
> Testing
> -------
>
> bin/mesos-tests.sh --gtest_filter=*CoordinatorTest*:*LogTest*:*ReplicaTest* --gtest_repeat=100
>
>
> Thanks,
>
> Jie Yu
>
>
Re: Review Request 14631: Catch-up Replicated Log 1: decoupled coordinator
logics and made them asynchronous.
Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14631/
-----------------------------------------------------------
(Updated Jan. 16, 2014, 11:37 p.m.)
Review request for mesos and Benjamin Hindman.
Changes
-------
Rebased. BenH's comment.
Bugs: MESOS-736
https://issues.apache.org/jira/browse/MESOS-736
Repository: mesos-git
Description
-------
This is the first patch of a series of patches that implement a catch-up mechanism for replicated log. See the following ticket for more details:
https://issues.apache.org/jira/browse/MESOS-736
Here is a brief summary of this patch: (Sorry for the fact that we are not able to break it into smaller patches :()
1) Pulled the original Coordinator logic out and divides it into several Paxos phases (see src/log/consensus.hpp). Instead of using a blocking semantics, we implemented all the logics asynchronously.
2) In order to ensure the liveness of a catch-uper, we implemented a retry logic by bumping the proposal number. This also requires us to slightly change the existing replica protocol.
3) Made the "fill" operation independent of the underlying replica. Instead, introduced a catchup (see src/log/catchup.hpp) function to make sure the underlying local replica has learned each write.
4) Modified the log tests to adapt to the new semantics (see (3) above)
This is a joint work with Yan Xu.
Diffs (updated)
-----
src/Makefile.am cf0c8c6
src/log/catchup.hpp PRE-CREATION
src/log/catchup.cpp PRE-CREATION
src/log/consensus.hpp PRE-CREATION
src/log/consensus.cpp PRE-CREATION
src/log/coordinator.hpp 3f6fb7c
src/log/coordinator.cpp 6e6466f
src/log/log.hpp 77edc7a
src/log/network.hpp d34cf78
src/log/replica.hpp d1f5ead
src/log/replica.cpp 82c2157
src/messages/log.proto e6460ab
src/tests/log_tests.cpp ff5f86c
Diff: https://reviews.apache.org/r/14631/diff/
Testing
-------
bin/mesos-tests.sh --gtest_filter=*CoordinatorTest*:*LogTest*:*ReplicaTest* --gtest_repeat=100
Thanks,
Jie Yu
Re: Review Request 14631: Catch-up Replicated Log 1: decoupled coordinator
logics and made them asynchronous.
Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14631/
-----------------------------------------------------------
(Updated Jan. 16, 2014, 6:37 a.m.)
Review request for mesos and Benjamin Hindman.
Changes
-------
Rebase.
Bugs: MESOS-736
https://issues.apache.org/jira/browse/MESOS-736
Repository: mesos-git
Description
-------
This is the first patch of a series of patches that implement a catch-up mechanism for replicated log. See the following ticket for more details:
https://issues.apache.org/jira/browse/MESOS-736
Here is a brief summary of this patch: (Sorry for the fact that we are not able to break it into smaller patches :()
1) Pulled the original Coordinator logic out and divides it into several Paxos phases (see src/log/consensus.hpp). Instead of using a blocking semantics, we implemented all the logics asynchronously.
2) In order to ensure the liveness of a catch-uper, we implemented a retry logic by bumping the proposal number. This also requires us to slightly change the existing replica protocol.
3) Made the "fill" operation independent of the underlying replica. Instead, introduced a catchup (see src/log/catchup.hpp) function to make sure the underlying local replica has learned each write.
4) Modified the log tests to adapt to the new semantics (see (3) above)
This is a joint work with Yan Xu.
Diffs (updated)
-----
src/Makefile.am cf0c8c6
src/log/catchup.hpp PRE-CREATION
src/log/catchup.cpp PRE-CREATION
src/log/consensus.hpp PRE-CREATION
src/log/consensus.cpp PRE-CREATION
src/log/coordinator.hpp 3f6fb7c
src/log/coordinator.cpp 6e6466f
src/log/log.hpp 77edc7a
src/log/network.hpp d34cf78
src/log/replica.hpp d1f5ead
src/log/replica.cpp 82c2157
src/messages/log.proto e6460ab
src/tests/log_tests.cpp ff5f86c
Diff: https://reviews.apache.org/r/14631/diff/
Testing
-------
bin/mesos-tests.sh --gtest_filter=*CoordinatorTest*:*LogTest*:*ReplicaTest* --gtest_repeat=100
Thanks,
Jie Yu
Re: Review Request 14631: Catch-up Replicated Log 1: decoupled coordinator
logics and made them asynchronous.
Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14631/
-----------------------------------------------------------
(Updated Nov. 25, 2013, 5:52 a.m.)
Review request for mesos and Benjamin Hindman.
Changes
-------
Rebased on (Log 0: https://reviews.apache.org/r/15799/).
Bugs: MESOS-736
https://issues.apache.org/jira/browse/MESOS-736
Repository: mesos-git
Description
-------
This is the first patch of a series of patches that implement a catch-up mechanism for replicated log. See the following ticket for more details:
https://issues.apache.org/jira/browse/MESOS-736
Here is a brief summary of this patch: (Sorry for the fact that we are not able to break it into smaller patches :()
1) Pulled the original Coordinator logic out and divides it into several Paxos phases (see src/log/consensus.hpp). Instead of using a blocking semantics, we implemented all the logics asynchronously.
2) In order to ensure the liveness of a catch-uper, we implemented a retry logic by bumping the proposal number. This also requires us to slightly change the existing replica protocol.
3) Made the "fill" operation independent of the underlying replica. Instead, introduced a catchup (see src/log/catchup.hpp) function to make sure the underlying local replica has learned each write.
4) Modified the log tests to adapt to the new semantics (see (3) above)
This is a joint work with Yan Xu.
Diffs (updated)
-----
src/Makefile.am feda34b
src/log/catchup.hpp PRE-CREATION
src/log/catchup.cpp PRE-CREATION
src/log/consensus.hpp PRE-CREATION
src/log/consensus.cpp PRE-CREATION
src/log/coordinator.hpp 3f6fb7c
src/log/coordinator.cpp 6e6466f
src/log/log.hpp 77edc7a
src/log/network.hpp d34cf78
src/log/replica.hpp d1f5ead
src/log/replica.cpp 59a6ff3
src/messages/log.proto 3d5859f
src/tests/log_tests.cpp ff5f86c
Diff: https://reviews.apache.org/r/14631/diff/
Testing
-------
bin/mesos-tests.sh --gtest_filter=*CoordinatorTest*:*LogTest*:*ReplicaTest* --gtest_repeat=100
Thanks,
Jie Yu
Re: Review Request 14631: Catch-up Replicated Log 1: decoupled coordinator
logics and made them asynchronous.
Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14631/
-----------------------------------------------------------
(Updated Nov. 23, 2013, 12:49 a.m.)
Review request for mesos and Benjamin Hindman.
Changes
-------
Addressed BenM's comments.
Bugs: MESOS-736
https://issues.apache.org/jira/browse/MESOS-736
Repository: mesos-git
Description
-------
This is the first patch of a series of patches that implement a catch-up mechanism for replicated log. See the following ticket for more details:
https://issues.apache.org/jira/browse/MESOS-736
Here is a brief summary of this patch: (Sorry for the fact that we are not able to break it into smaller patches :()
1) Pulled the original Coordinator logic out and divides it into several Paxos phases (see src/log/consensus.hpp). Instead of using a blocking semantics, we implemented all the logics asynchronously.
2) In order to ensure the liveness of a catch-uper, we implemented a retry logic by bumping the proposal number. This also requires us to slightly change the existing replica protocol.
3) Made the "fill" operation independent of the underlying replica. Instead, introduced a catchup (see src/log/catchup.hpp) function to make sure the underlying local replica has learned each write.
4) Modified the log tests to adapt to the new semantics (see (3) above)
This is a joint work with Yan Xu.
Diffs (updated)
-----
src/Makefile.am feda34b
src/log/catchup.hpp PRE-CREATION
src/log/catchup.cpp PRE-CREATION
src/log/consensus.hpp PRE-CREATION
src/log/consensus.cpp PRE-CREATION
src/log/coordinator.hpp 3f6fb7c
src/log/coordinator.cpp 6e6466f
src/log/log.hpp 77edc7a
src/log/network.hpp d34cf78
src/log/replica.hpp d1f5ead
src/log/replica.cpp 59a6ff3
src/messages/log.proto 3d5859f
src/tests/log_tests.cpp ff5f86c
Diff: https://reviews.apache.org/r/14631/diff/
Testing
-------
bin/mesos-tests.sh --gtest_filter=*CoordinatorTest*:*LogTest*:*ReplicaTest* --gtest_repeat=100
Thanks,
Jie Yu
Re: Review Request 14631: Catch-up Replicated Log 1: decoupled coordinator
logics and made them asynchronous.
Posted by Jie Yu <yu...@gmail.com>.
> On Nov. 15, 2013, 3:06 a.m., Ben Mahler wrote:
> > src/log/coordinator.cpp, lines 500-503
> > <https://reviews.apache.org/r/14631/diff/14/?file=377409#file377409line500>
> >
> > Are we writing the same data? Technically, aren't we changing the learned bit of the Action when we write it after log::learn?
Yes, we are writing the same data.
- Jie
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14631/#review28918
-----------------------------------------------------------
On Nov. 13, 2013, 8:41 p.m., Jie Yu wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14631/
> -----------------------------------------------------------
>
> (Updated Nov. 13, 2013, 8:41 p.m.)
>
>
> Review request for mesos and Benjamin Hindman.
>
>
> Bugs: MESOS-736
> https://issues.apache.org/jira/browse/MESOS-736
>
>
> Repository: mesos-git
>
>
> Description
> -------
>
> This is the first patch of a series of patches that implement a catch-up mechanism for replicated log. See the following ticket for more details:
> https://issues.apache.org/jira/browse/MESOS-736
>
> Here is a brief summary of this patch: (Sorry for the fact that we are not able to break it into smaller patches :()
>
> 1) Pulled the original Coordinator logic out and divides it into several Paxos phases (see src/log/consensus.hpp). Instead of using a blocking semantics, we implemented all the logics asynchronously.
>
> 2) In order to ensure the liveness of a catch-uper, we implemented a retry logic by bumping the proposal number. This also requires us to slightly change the existing replica protocol.
>
> 3) Made the "fill" operation independent of the underlying replica. Instead, introduced a catchup (see src/log/catchup.hpp) function to make sure the underlying local replica has learned each write.
>
> 4) Modified the log tests to adapt to the new semantics (see (3) above)
>
> This is a joint work with Yan Xu.
>
>
> Diffs
> -----
>
> src/Makefile.am 9780d07
> src/log/catchup.hpp PRE-CREATION
> src/log/catchup.cpp PRE-CREATION
> src/log/consensus.hpp PRE-CREATION
> src/log/consensus.cpp PRE-CREATION
> src/log/coordinator.hpp 3f6fb7c
> src/log/coordinator.cpp 6e6466f
> src/log/log.hpp 77edc7a
> src/log/network.hpp d34cf78
> src/log/replica.hpp d1f5ead
> src/log/replica.cpp 59a6ff3
> src/messages/log.proto 3d5859f
> src/tests/log_tests.cpp ff5f86c
>
> Diff: https://reviews.apache.org/r/14631/diff/
>
>
> Testing
> -------
>
> bin/mesos-tests.sh --gtest_filter=*CoordinatorTest*:*LogTest*:*ReplicaTest* --gtest_repeat=100
>
>
> Thanks,
>
> Jie Yu
>
>
Re: Review Request 14631: Catch-up Replicated Log 1: decoupled coordinator
logics and made them asynchronous.
Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14631/#review28918
-----------------------------------------------------------
This looks good, great cleanup! :)
Here's how I would have rebased and pulled apart this change into a few logical changes:
1. Updated Network::Broadcast to return a future.
2. Renamed id/coordinator to proposal in log.proto and a few of the C++ files.
3. Added a new missing() call on Replica along with the added const correctness.
4. The meat of the change :)
Although it may seem minor, I think review time/difficulty increases quadratically or more with the size of a diff (especially when Paxos is involved ;)), so every bit helps! Anyway, good stuff!
src/log/catchup.cpp
<https://reviews.apache.org/r/14631/#comment55983>
4 space indent
src/log/catchup.cpp
<https://reviews.apache.org/r/14631/#comment55985>
4 space indent
src/log/catchup.cpp
<https://reviews.apache.org/r/14631/#comment55987>
Can you add the position number?
"Failed to catch-up position " + stringify(*it) + ": " + catching.failure()
src/log/catchup.cpp
<https://reviews.apache.org/r/14631/#comment55986>
Can you elaborate on this comment? It's not clear to me where the unnecessary proposal number bumps will originate and why this prevents it
src/log/consensus.hpp
<https://reviews.apache.org/r/14631/#comment55995>
s/gain/gained/
src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment55990>
indent with 4 spaces
src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment55991>
Should this be CHECK_EQ? CHECK_EQ will print the actual vs expected when it fails
src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment55992>
CHECK_EQ?
src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment55993>
CHECK_EQ?
src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment55994>
4 spaces
src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment55996>
4 spaces
src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment55997>
LOG(FATAL) << "Unknown Action::Type " << action.type();
src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment55998>
CHECK_EQ
src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment55999>
4 spaces
src/log/coordinator.hpp
<https://reviews.apache.org/r/14631/#comment55988>
Where did the documentation go?
src/log/coordinator.cpp
<https://reviews.apache.org/r/14631/#comment56000>
Are we writing the same data? Technically, aren't we changing the learned bit of the Action when we write it after log::learn?
- Ben Mahler
On Nov. 13, 2013, 8:41 p.m., Jie Yu wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14631/
> -----------------------------------------------------------
>
> (Updated Nov. 13, 2013, 8:41 p.m.)
>
>
> Review request for mesos and Benjamin Hindman.
>
>
> Bugs: MESOS-736
> https://issues.apache.org/jira/browse/MESOS-736
>
>
> Repository: mesos-git
>
>
> Description
> -------
>
> This is the first patch of a series of patches that implement a catch-up mechanism for replicated log. See the following ticket for more details:
> https://issues.apache.org/jira/browse/MESOS-736
>
> Here is a brief summary of this patch: (Sorry for the fact that we are not able to break it into smaller patches :()
>
> 1) Pulled the original Coordinator logic out and divides it into several Paxos phases (see src/log/consensus.hpp). Instead of using a blocking semantics, we implemented all the logics asynchronously.
>
> 2) In order to ensure the liveness of a catch-uper, we implemented a retry logic by bumping the proposal number. This also requires us to slightly change the existing replica protocol.
>
> 3) Made the "fill" operation independent of the underlying replica. Instead, introduced a catchup (see src/log/catchup.hpp) function to make sure the underlying local replica has learned each write.
>
> 4) Modified the log tests to adapt to the new semantics (see (3) above)
>
> This is a joint work with Yan Xu.
>
>
> Diffs
> -----
>
> src/Makefile.am 9780d07
> src/log/catchup.hpp PRE-CREATION
> src/log/catchup.cpp PRE-CREATION
> src/log/consensus.hpp PRE-CREATION
> src/log/consensus.cpp PRE-CREATION
> src/log/coordinator.hpp 3f6fb7c
> src/log/coordinator.cpp 6e6466f
> src/log/log.hpp 77edc7a
> src/log/network.hpp d34cf78
> src/log/replica.hpp d1f5ead
> src/log/replica.cpp 59a6ff3
> src/messages/log.proto 3d5859f
> src/tests/log_tests.cpp ff5f86c
>
> Diff: https://reviews.apache.org/r/14631/diff/
>
>
> Testing
> -------
>
> bin/mesos-tests.sh --gtest_filter=*CoordinatorTest*:*LogTest*:*ReplicaTest* --gtest_repeat=100
>
>
> Thanks,
>
> Jie Yu
>
>
Re: Review Request 14631: Catch-up Replicated Log 1: decoupled coordinator
logics and made them asynchronous.
Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14631/
-----------------------------------------------------------
(Updated Nov. 13, 2013, 8:41 p.m.)
Review request for mesos and Benjamin Hindman.
Changes
-------
Updated "depends on".
Bugs: MESOS-736
https://issues.apache.org/jira/browse/MESOS-736
Repository: mesos-git
Description
-------
This is the first patch of a series of patches that implement a catch-up mechanism for replicated log. See the following ticket for more details:
https://issues.apache.org/jira/browse/MESOS-736
Here is a brief summary of this patch: (Sorry for the fact that we are not able to break it into smaller patches :()
1) Pulled the original Coordinator logic out and divides it into several Paxos phases (see src/log/consensus.hpp). Instead of using a blocking semantics, we implemented all the logics asynchronously.
2) In order to ensure the liveness of a catch-uper, we implemented a retry logic by bumping the proposal number. This also requires us to slightly change the existing replica protocol.
3) Made the "fill" operation independent of the underlying replica. Instead, introduced a catchup (see src/log/catchup.hpp) function to make sure the underlying local replica has learned each write.
4) Modified the log tests to adapt to the new semantics (see (3) above)
This is a joint work with Yan Xu.
Diffs
-----
src/Makefile.am 9780d07
src/log/catchup.hpp PRE-CREATION
src/log/catchup.cpp PRE-CREATION
src/log/consensus.hpp PRE-CREATION
src/log/consensus.cpp PRE-CREATION
src/log/coordinator.hpp 3f6fb7c
src/log/coordinator.cpp 6e6466f
src/log/log.hpp 77edc7a
src/log/network.hpp d34cf78
src/log/replica.hpp d1f5ead
src/log/replica.cpp 59a6ff3
src/messages/log.proto 3d5859f
src/tests/log_tests.cpp ff5f86c
Diff: https://reviews.apache.org/r/14631/diff/
Testing
-------
bin/mesos-tests.sh --gtest_filter=*CoordinatorTest*:*LogTest*:*ReplicaTest* --gtest_repeat=100
Thanks,
Jie Yu
Re: Review Request 14631: Catch-up Replicated Log 1: decoupled coordinator
logics and made them asynchronous.
Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14631/
-----------------------------------------------------------
(Updated Nov. 5, 2013, 12:02 a.m.)
Review request for mesos and Benjamin Hindman.
Changes
-------
Addressed BenH's comments.
Tests:
bin/mesos-tests.sh --gtest_filter=CoordinatorTest.*:ReplicaTest.*:LogTest.* --gtest_repeat=100
Bugs: MESOS-736
https://issues.apache.org/jira/browse/MESOS-736
Repository: mesos-git
Description
-------
This is the first patch of a series of patches that implement a catch-up mechanism for replicated log. See the following ticket for more details:
https://issues.apache.org/jira/browse/MESOS-736
Here is a brief summary of this patch: (Sorry for the fact that we are not able to break it into smaller patches :()
1) Pulled the original Coordinator logic out and divides it into several Paxos phases (see src/log/consensus.hpp). Instead of using a blocking semantics, we implemented all the logics asynchronously.
2) In order to ensure the liveness of a catch-uper, we implemented a retry logic by bumping the proposal number. This also requires us to slightly change the existing replica protocol.
3) Made the "fill" operation independent of the underlying replica. Instead, introduced a catchup (see src/log/catchup.hpp) function to make sure the underlying local replica has learned each write.
4) Modified the log tests to adapt to the new semantics (see (3) above)
This is a joint work with Yan Xu.
Diffs (updated)
-----
src/Makefile.am 9780d07
src/log/catchup.hpp PRE-CREATION
src/log/catchup.cpp PRE-CREATION
src/log/consensus.hpp PRE-CREATION
src/log/consensus.cpp PRE-CREATION
src/log/coordinator.hpp 3f6fb7c
src/log/coordinator.cpp 6e6466f
src/log/log.hpp 77edc7a
src/log/network.hpp d34cf78
src/log/replica.hpp d1f5ead
src/log/replica.cpp 59a6ff3
src/messages/log.proto 3d5859f
src/tests/log_tests.cpp ff5f86c
Diff: https://reviews.apache.org/r/14631/diff/
Testing
-------
bin/mesos-tests.sh --gtest_filter=*CoordinatorTest*:*LogTest*:*ReplicaTest* --gtest_repeat=100
Thanks,
Jie Yu
Re: Review Request 14631: Catch-up Replicated Log 1: decoupled coordinator
logics and made them asynchronous.
Posted by Jie Yu <yu...@gmail.com>.
> On Nov. 4, 2013, 8:45 p.m., Benjamin Hindman wrote:
> > It would be great to include some documentation that lists all possible proposers for the replicas.
Done. Added a paragraph in src/log/consensus.hpp.
- Jie
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14631/#review27506
-----------------------------------------------------------
On Nov. 1, 2013, 12:18 a.m., Jie Yu wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14631/
> -----------------------------------------------------------
>
> (Updated Nov. 1, 2013, 12:18 a.m.)
>
>
> Review request for mesos and Benjamin Hindman.
>
>
> Bugs: MESOS-736
> https://issues.apache.org/jira/browse/MESOS-736
>
>
> Repository: mesos-git
>
>
> Description
> -------
>
> This is the first patch of a series of patches that implement a catch-up mechanism for replicated log. See the following ticket for more details:
> https://issues.apache.org/jira/browse/MESOS-736
>
> Here is a brief summary of this patch: (Sorry for the fact that we are not able to break it into smaller patches :()
>
> 1) Pulled the original Coordinator logic out and divides it into several Paxos phases (see src/log/consensus.hpp). Instead of using a blocking semantics, we implemented all the logics asynchronously.
>
> 2) In order to ensure the liveness of a catch-uper, we implemented a retry logic by bumping the proposal number. This also requires us to slightly change the existing replica protocol.
>
> 3) Made the "fill" operation independent of the underlying replica. Instead, introduced a catchup (see src/log/catchup.hpp) function to make sure the underlying local replica has learned each write.
>
> 4) Modified the log tests to adapt to the new semantics (see (3) above)
>
> This is a joint work with Yan Xu.
>
>
> Diffs
> -----
>
> src/Makefile.am a11c76b
> src/log/catchup.hpp PRE-CREATION
> src/log/catchup.cpp PRE-CREATION
> src/log/consensus.hpp PRE-CREATION
> src/log/consensus.cpp PRE-CREATION
> src/log/coordinator.hpp 3f6fb7c
> src/log/coordinator.cpp 6e6466f
> src/log/log.hpp 77edc7a
> src/log/network.hpp d34cf78
> src/log/replica.hpp d1f5ead
> src/log/replica.cpp 59a6ff3
> src/messages/log.proto 3d5859f
> src/tests/log_tests.cpp ff5f86c
>
> Diff: https://reviews.apache.org/r/14631/diff/
>
>
> Testing
> -------
>
> bin/mesos-tests.sh --gtest_filter=*CoordinatorTest*:*LogTest*:*ReplicaTest* --gtest_repeat=100
>
>
> Thanks,
>
> Jie Yu
>
>
Re: Review Request 14631: Catch-up Replicated Log 1: decoupled
coordinator logics and made them asynchronous.
Posted by Jie Yu <yu...@gmail.com>.
Ben, FYI, your last review is not for the latest version.
I addressed all your comments accordingly in the newest version.
- Jie
On Mon, Nov 4, 2013 at 12:45 PM, Benjamin Hindman <be...@berkeley.edu> wrote:
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14631/
>
> It would be great to include some documentation that lists all possible proposers for the replicas.
>
>
> src/log/log.hpp<https://reviews.apache.org/r/14631/diff/6/?file=370346#file370346line190> (Diff
> revision 6)
>
> public:
>
> 186
>
> quorum = _quorum;
>
> 190
>
> replica.reset(new Replica(path));
>
> Any reason not to do this in the initializer list above?
>
>
> src/log/log.hpp<https://reviews.apache.org/r/14631/diff/6/?file=370346#file370346line222> (Diff
> revision 6)
>
> public:
>
> 213
>
> replica = new Replica(path);
>
> 217
>
> replica.reset(new Replica(path));
>
> 214
>
> 218
>
> 215
>
> group = new zookeeper::Group(servers, timeout, znode, auth);
>
> 219
>
> network.reset(new ZooKeeperNetwork(servers, timeout, znode, auth));
>
> Any reason not to do these in the initializer list above? I.e.,
>
> group(new zookeeper::Group(servers, timeout, znode, auth)),
> quorum(_quorum),
> replica(new Replica(path)),
> network(new ZooKeeperNetwork(servers, timeout, znode, auth))
> {
>
>
> src/log/log.hpp<https://reviews.apache.org/r/14631/diff/6/?file=370346#file370346line283> (Diff
> revision 6)
>
> public:
>
> 276
>
> // For renewing membership.
>
> 264
>
> zookeeper::Group* group;
>
> 277
>
> zookeeper::Group* group;
>
> How about a bit more context? I.e., "We store a Group instance in order to continually renew the replicas membership (when using ZooKeeper)."
>
>
> src/log/network.hpp<https://reviews.apache.org/r/14631/diff/6/?file=370347#file370347line258> (Diff
> revision 6)
>
> inline void Network::set(const std::set<process::UPID>& pids)
>
> 258
>
> group = new zookeeper::Group(servers, timeout, znode, auth);
>
> Does it need to be a pointer? Also, can ZooKeeperNetwork be copied? Do we want it to be copyable?
>
>
> src/log/replica.hpp<https://reviews.apache.org/r/14631/diff/6/?file=370348#file370348line63> (Diff
> revision 6)
>
> public:
>
> 60
>
> uint64_t from, uint64_t to) const;
>
> Let's put each parameter on it's own line.
>
>
> src/log/replica.hpp<https://reviews.apache.org/r/14631/diff/6/?file=370348#file370348line72> (Diff
> revision 6)
>
> public:
>
> 69
>
> uint64_t from, uint64_t to) const;
>
> Each parameter on it's own line please.
>
>
> src/log/replica.cpp<https://reviews.apache.org/r/14631/diff/6/?file=370349#file370349line540> (Diff
> revision 6)
>
> Try<Action> LevelDBStorage::read(uint64_t position)
>
> 536
>
> // Last promise made to a coordinator.
>
> 534
>
> // Last promise made to a coordinator.
>
> s/coordinator/proposer/
>
>
> src/messages/log.proto<https://reviews.apache.org/r/14631/diff/6/?file=370350#file370350line22> (Diff
> revision 6) 22
>
> // Represents a "promise" that a replica has made to a coordinator. A
>
> 22
>
> // Represents a "promise" that a replica has made to a coordinator. A
>
> s/to a coordinator//
>
>
> src/messages/log.proto<https://reviews.apache.org/r/14631/diff/6/?file=370350#file370350line25> (Diff
> revision 6) 25
>
> // same coordinator), until a new promise is made to a coordinator
>
> 25
>
> // same coordinator), until a new promise is made to a coordinator
>
> s/coordinator/proposer/
>
>
> - Benjamin Hindman
>
> On November 1st, 2013, 12:18 a.m. UTC, Jie Yu wrote:
> Review request for mesos and Benjamin Hindman.
> By Jie Yu.
>
> *Updated Nov. 1, 2013, 12:18 a.m.*
> *Bugs: * MESOS-736 <https://issues.apache.org/jira/browse/MESOS-736>
> *Repository: * mesos-git
> Description
>
> This is the first patch of a series of patches that implement a catch-up mechanism for replicated log. See the following ticket for more details:https://issues.apache.org/jira/browse/MESOS-736
>
> Here is a brief summary of this patch: (Sorry for the fact that we are not able to break it into smaller patches :()
>
> 1) Pulled the original Coordinator logic out and divides it into several Paxos phases (see src/log/consensus.hpp). Instead of using a blocking semantics, we implemented all the logics asynchronously.
>
> 2) In order to ensure the liveness of a catch-uper, we implemented a retry logic by bumping the proposal number. This also requires us to slightly change the existing replica protocol.
>
> 3) Made the "fill" operation independent of the underlying replica. Instead, introduced a catchup (see src/log/catchup.hpp) function to make sure the underlying local replica has learned each write.
>
> 4) Modified the log tests to adapt to the new semantics (see (3) above)
>
> This is a joint work with Yan Xu.
>
> Testing
>
> bin/mesos-tests.sh --gtest_filter=*CoordinatorTest*:*LogTest*:*ReplicaTest* --gtest_repeat=100
>
> Diffs
>
> - src/Makefile.am (a11c76b)
> - src/log/catchup.hpp (PRE-CREATION)
> - src/log/catchup.cpp (PRE-CREATION)
> - src/log/consensus.hpp (PRE-CREATION)
> - src/log/consensus.cpp (PRE-CREATION)
> - src/log/coordinator.hpp (3f6fb7c)
> - src/log/coordinator.cpp (6e6466f)
> - src/log/log.hpp (77edc7a)
> - src/log/network.hpp (d34cf78)
> - src/log/replica.hpp (d1f5ead)
> - src/log/replica.cpp (59a6ff3)
> - src/messages/log.proto (3d5859f)
> - src/tests/log_tests.cpp (ff5f86c)
>
> View Diff <https://reviews.apache.org/r/14631/diff/>
>
Re: Review Request 14631: Catch-up Replicated Log 1: decoupled coordinator
logics and made them asynchronous.
Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14631/#review27506
-----------------------------------------------------------
It would be great to include some documentation that lists all possible proposers for the replicas.
src/log/log.hpp
<https://reviews.apache.org/r/14631/#comment53434>
Any reason not to do this in the initializer list above?
src/log/log.hpp
<https://reviews.apache.org/r/14631/#comment53433>
Any reason not to do these in the initializer list above? I.e.,
group(new zookeeper::Group(servers, timeout, znode, auth)),
quorum(_quorum),
replica(new Replica(path)),
network(new ZooKeeperNetwork(servers, timeout, znode, auth))
{
src/log/log.hpp
<https://reviews.apache.org/r/14631/#comment53435>
How about a bit more context? I.e., "We store a Group instance in order to continually renew the replicas membership (when using ZooKeeper)."
src/log/network.hpp
<https://reviews.apache.org/r/14631/#comment53436>
Does it need to be a pointer? Also, can ZooKeeperNetwork be copied? Do we want it to be copyable?
src/log/replica.hpp
<https://reviews.apache.org/r/14631/#comment53437>
Let's put each parameter on it's own line.
src/log/replica.hpp
<https://reviews.apache.org/r/14631/#comment53438>
Each parameter on it's own line please.
src/log/replica.cpp
<https://reviews.apache.org/r/14631/#comment53441>
s/coordinator/proposer/
src/messages/log.proto
<https://reviews.apache.org/r/14631/#comment53439>
s/to a coordinator//
src/messages/log.proto
<https://reviews.apache.org/r/14631/#comment53440>
s/coordinator/proposer/
- Benjamin Hindman
On Nov. 1, 2013, 12:18 a.m., Jie Yu wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14631/
> -----------------------------------------------------------
>
> (Updated Nov. 1, 2013, 12:18 a.m.)
>
>
> Review request for mesos and Benjamin Hindman.
>
>
> Bugs: MESOS-736
> https://issues.apache.org/jira/browse/MESOS-736
>
>
> Repository: mesos-git
>
>
> Description
> -------
>
> This is the first patch of a series of patches that implement a catch-up mechanism for replicated log. See the following ticket for more details:
> https://issues.apache.org/jira/browse/MESOS-736
>
> Here is a brief summary of this patch: (Sorry for the fact that we are not able to break it into smaller patches :()
>
> 1) Pulled the original Coordinator logic out and divides it into several Paxos phases (see src/log/consensus.hpp). Instead of using a blocking semantics, we implemented all the logics asynchronously.
>
> 2) In order to ensure the liveness of a catch-uper, we implemented a retry logic by bumping the proposal number. This also requires us to slightly change the existing replica protocol.
>
> 3) Made the "fill" operation independent of the underlying replica. Instead, introduced a catchup (see src/log/catchup.hpp) function to make sure the underlying local replica has learned each write.
>
> 4) Modified the log tests to adapt to the new semantics (see (3) above)
>
> This is a joint work with Yan Xu.
>
>
> Diffs
> -----
>
> src/Makefile.am a11c76b
> src/log/catchup.hpp PRE-CREATION
> src/log/catchup.cpp PRE-CREATION
> src/log/consensus.hpp PRE-CREATION
> src/log/consensus.cpp PRE-CREATION
> src/log/coordinator.hpp 3f6fb7c
> src/log/coordinator.cpp 6e6466f
> src/log/log.hpp 77edc7a
> src/log/network.hpp d34cf78
> src/log/replica.hpp d1f5ead
> src/log/replica.cpp 59a6ff3
> src/messages/log.proto 3d5859f
> src/tests/log_tests.cpp ff5f86c
>
> Diff: https://reviews.apache.org/r/14631/diff/
>
>
> Testing
> -------
>
> bin/mesos-tests.sh --gtest_filter=*CoordinatorTest*:*LogTest*:*ReplicaTest* --gtest_repeat=100
>
>
> Thanks,
>
> Jie Yu
>
>