You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Jie Yu <yu...@gmail.com> on 2013/11/22 21:30:58 UTC

Re: Review Request 15802: Catch-up Replicated Log 3.2: Added log recovery support.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15802/
-----------------------------------------------------------

(Updated Nov. 22, 2013, 8:30 p.m.)


Review request for mesos, Benjamin Hindman, Ben Mahler, Vinod Kone, and Jiang Yan Xu.


Changes
-------

Updated the title.


Summary (updated)
-----------------

Catch-up Replicated Log 3.2: Added log recovery support.


Bugs: MESOS-736
    https://issues.apache.org/jira/browse/MESOS-736


Repository: mesos-git


Description
-------

This is the last patch of a series of patches that implement catch-up replicated log.

Here is summary of this patch:
1) Introduced RecoverRequest/RecoverResponse.
2) Used Metadata in replica code.
3) A 2-phase empty log start-up algorithm.
4) Added a test to test two competing recover processes.

This is a joint work with Yan Xu.


Diffs
-----

  src/Makefile.am feda34b 
  src/log/log.hpp 77edc7a 
  src/log/recover.hpp PRE-CREATION 
  src/log/recover.cpp PRE-CREATION 
  src/log/replica.hpp d1f5ead 
  src/log/replica.cpp 59a6ff3 
  src/tests/log_tests.cpp ff5f86c 

Diff: https://reviews.apache.org/r/15802/diff/


Testing
-------

make check

bin/mesos-tests.sh --gtest_filter=LogTest.*:ReplicaTest.*:CoordinatorTest.*:LogZooKeeperTest.* --gtest_repeat=100


Thanks,

Jie Yu


Re: Review Request 15802: Catch-up Replicated Log 3.1: Added log recovery support.

Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15802/
-----------------------------------------------------------

(Updated Jan. 16, 2014, 11:39 p.m.)


Review request for mesos, Benjamin Hindman, Ben Mahler, Vinod Kone, and Jiang Yan Xu.


Changes
-------

Rebased.


Bugs: MESOS-736
    https://issues.apache.org/jira/browse/MESOS-736


Repository: mesos-git


Description
-------

This is the last patch of a series of patches that implement catch-up replicated log.

Here is summary of this patch:
1) Introduced RecoverRequest/RecoverResponse.
2) Used Metadata in replica code.
3) A 2-phase empty log start-up algorithm.
4) Added a test to test two competing recover processes.

This is a joint work with Yan Xu.


Diffs (updated)
-----

  src/Makefile.am cf0c8c6 
  src/common/type_utils.hpp 3b05751 
  src/log/coordinator.hpp 3f6fb7c 
  src/log/coordinator.cpp 6e6466f 
  src/log/log.hpp 77edc7a 
  src/log/log.cpp d057925 
  src/log/recover.hpp PRE-CREATION 
  src/log/recover.cpp PRE-CREATION 
  src/log/replica.hpp d1f5ead 
  src/log/replica.cpp 82c2157 
  src/messages/log.proto e6460ab 
  src/tests/log_tests.cpp ff5f86c 

Diff: https://reviews.apache.org/r/15802/diff/


Testing
-------

make check

bin/mesos-tests.sh --gtest_filter=LogTest.*:ReplicaTest.*:CoordinatorTest.*:LogZooKeeperTest.* --gtest_repeat=100


Thanks,

Jie Yu


Re: Review Request 15802: Catch-up Replicated Log 3.1: Added log recovery support.

Posted by Jie Yu <yu...@gmail.com>.

> On Jan. 16, 2014, 4:18 p.m., Benjamin Hindman wrote:
> > src/log/recover.cpp, line 152
> > <https://reviews.apache.org/r/15802/diff/9/?file=424570#file424570line152>
> >
> >     It's interesting that you don't use Network::watch here but that you do retry down below if broadcast returns an empty set. Do you want to use watch here too?

Added a new review: Catch-up Replicated Log 4.3.


- Jie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15802/#review32033
-----------------------------------------------------------


On Jan. 16, 2014, 6:40 a.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/15802/
> -----------------------------------------------------------
> 
> (Updated Jan. 16, 2014, 6:40 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, Vinod Kone, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-736
>     https://issues.apache.org/jira/browse/MESOS-736
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> This is the last patch of a series of patches that implement catch-up replicated log.
> 
> Here is summary of this patch:
> 1) Introduced RecoverRequest/RecoverResponse.
> 2) Used Metadata in replica code.
> 3) A 2-phase empty log start-up algorithm.
> 4) Added a test to test two competing recover processes.
> 
> This is a joint work with Yan Xu.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am cf0c8c6 
>   src/common/type_utils.hpp 3b05751 
>   src/log/coordinator.hpp 3f6fb7c 
>   src/log/coordinator.cpp 6e6466f 
>   src/log/log.hpp 77edc7a 
>   src/log/log.cpp d057925 
>   src/log/recover.hpp PRE-CREATION 
>   src/log/recover.cpp PRE-CREATION 
>   src/log/replica.hpp d1f5ead 
>   src/log/replica.cpp 82c2157 
>   src/messages/log.proto e6460ab 
>   src/tests/log_tests.cpp ff5f86c 
> 
> Diff: https://reviews.apache.org/r/15802/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> bin/mesos-tests.sh --gtest_filter=LogTest.*:ReplicaTest.*:CoordinatorTest.*:LogZooKeeperTest.* --gtest_repeat=100
> 
> 
> Thanks,
> 
> Jie Yu
> 
>


Re: Review Request 15802: Catch-up Replicated Log 3.1: Added log recovery support.

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15802/#review32033
-----------------------------------------------------------

Ship it!



src/log/recover.cpp
<https://reviews.apache.org/r/15802/#comment60759>

    It's interesting that you don't use Network::watch here but that you do retry down below if broadcast returns an empty set. Do you want to use watch here too?


- Benjamin Hindman


On Jan. 16, 2014, 6:40 a.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/15802/
> -----------------------------------------------------------
> 
> (Updated Jan. 16, 2014, 6:40 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, Vinod Kone, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-736
>     https://issues.apache.org/jira/browse/MESOS-736
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> This is the last patch of a series of patches that implement catch-up replicated log.
> 
> Here is summary of this patch:
> 1) Introduced RecoverRequest/RecoverResponse.
> 2) Used Metadata in replica code.
> 3) A 2-phase empty log start-up algorithm.
> 4) Added a test to test two competing recover processes.
> 
> This is a joint work with Yan Xu.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am cf0c8c6 
>   src/common/type_utils.hpp 3b05751 
>   src/log/coordinator.hpp 3f6fb7c 
>   src/log/coordinator.cpp 6e6466f 
>   src/log/log.hpp 77edc7a 
>   src/log/log.cpp d057925 
>   src/log/recover.hpp PRE-CREATION 
>   src/log/recover.cpp PRE-CREATION 
>   src/log/replica.hpp d1f5ead 
>   src/log/replica.cpp 82c2157 
>   src/messages/log.proto e6460ab 
>   src/tests/log_tests.cpp ff5f86c 
> 
> Diff: https://reviews.apache.org/r/15802/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> bin/mesos-tests.sh --gtest_filter=LogTest.*:ReplicaTest.*:CoordinatorTest.*:LogZooKeeperTest.* --gtest_repeat=100
> 
> 
> Thanks,
> 
> Jie Yu
> 
>


Re: Review Request 15802: Catch-up Replicated Log 3.1: Added log recovery support.

Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15802/
-----------------------------------------------------------

(Updated Jan. 16, 2014, 6:40 a.m.)


Review request for mesos, Benjamin Hindman, Ben Mahler, Vinod Kone, and Jiang Yan Xu.


Changes
-------

Rebase. Addressed BenH's comments.

Removed 'strict' flag and 'STARTING' status.


Bugs: MESOS-736
    https://issues.apache.org/jira/browse/MESOS-736


Repository: mesos-git


Description
-------

This is the last patch of a series of patches that implement catch-up replicated log.

Here is summary of this patch:
1) Introduced RecoverRequest/RecoverResponse.
2) Used Metadata in replica code.
3) A 2-phase empty log start-up algorithm.
4) Added a test to test two competing recover processes.

This is a joint work with Yan Xu.


Diffs (updated)
-----

  src/Makefile.am cf0c8c6 
  src/common/type_utils.hpp 3b05751 
  src/log/coordinator.hpp 3f6fb7c 
  src/log/coordinator.cpp 6e6466f 
  src/log/log.hpp 77edc7a 
  src/log/log.cpp d057925 
  src/log/recover.hpp PRE-CREATION 
  src/log/recover.cpp PRE-CREATION 
  src/log/replica.hpp d1f5ead 
  src/log/replica.cpp 82c2157 
  src/messages/log.proto e6460ab 
  src/tests/log_tests.cpp ff5f86c 

Diff: https://reviews.apache.org/r/15802/diff/


Testing
-------

make check

bin/mesos-tests.sh --gtest_filter=LogTest.*:ReplicaTest.*:CoordinatorTest.*:LogZooKeeperTest.* --gtest_repeat=100


Thanks,

Jie Yu


Re: Review Request 15802: Catch-up Replicated Log 3.1: Added log recovery support.

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15802/#review31934
-----------------------------------------------------------



src/log/recover.cpp
<https://reviews.apache.org/r/15802/#comment60692>

    Scenarios:
    
    (1) A and B and C all in VOTING, adding D, quorum of size 2:
    
    D goes into RECOVERING, recovers then goes into VOTING.
    
    (2) A and B in VOTING, adding D, quorum size of 2:
    
    D goes into RECOVERING, recovers then goes into VOTING.
    
    (3) A in VOTING, adding D, quorum size of 2:
    
    D tries to recover forever.
    
    (4) A in VOTING and B in EMPTY, adding D, quorum of size 2:
    
    D tries to recover forever.
    
    (5) A in VOTING and B in STARTING, adding D, quorum of size 2:
    
    D goes into VOTING, does not recover from A, B does the same thing.
    
    (6) A in VOTING and adding B which was initailized with STARTING, quorum of size 2:
    
    B goes into VOTING, does not recover from A.


- Benjamin Hindman


On Dec. 21, 2013, 9:11 a.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/15802/
> -----------------------------------------------------------
> 
> (Updated Dec. 21, 2013, 9:11 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, Vinod Kone, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-736
>     https://issues.apache.org/jira/browse/MESOS-736
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> This is the last patch of a series of patches that implement catch-up replicated log.
> 
> Here is summary of this patch:
> 1) Introduced RecoverRequest/RecoverResponse.
> 2) Used Metadata in replica code.
> 3) A 2-phase empty log start-up algorithm.
> 4) Added a test to test two competing recover processes.
> 
> This is a joint work with Yan Xu.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am ddc43bd 
>   src/common/type_utils.hpp 3b05751 
>   src/log/coordinator.hpp 3f6fb7c 
>   src/log/coordinator.cpp 6e6466f 
>   src/log/log.hpp 77edc7a 
>   src/log/log.cpp d057925 
>   src/log/recover.hpp PRE-CREATION 
>   src/log/recover.cpp PRE-CREATION 
>   src/log/replica.hpp d1f5ead 
>   src/log/replica.cpp 82c2157 
>   src/messages/log.proto e6460ab 
>   src/tests/log_tests.cpp ff5f86c 
> 
> Diff: https://reviews.apache.org/r/15802/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> bin/mesos-tests.sh --gtest_filter=LogTest.*:ReplicaTest.*:CoordinatorTest.*:LogZooKeeperTest.* --gtest_repeat=100
> 
> 
> Thanks,
> 
> Jie Yu
> 
>


Re: Review Request 15802: Catch-up Replicated Log 3.1: Added log recovery support.

Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15802/
-----------------------------------------------------------

(Updated Dec. 21, 2013, 9:11 a.m.)


Review request for mesos, Benjamin Hindman, Ben Mahler, Vinod Kone, and Jiang Yan Xu.


Changes
-------

Do not allow auto start-up for safety concerns.


Summary (updated)
-----------------

Catch-up Replicated Log 3.1: Added log recovery support.


Bugs: MESOS-736
    https://issues.apache.org/jira/browse/MESOS-736


Repository: mesos-git


Description
-------

This is the last patch of a series of patches that implement catch-up replicated log.

Here is summary of this patch:
1) Introduced RecoverRequest/RecoverResponse.
2) Used Metadata in replica code.
3) A 2-phase empty log start-up algorithm.
4) Added a test to test two competing recover processes.

This is a joint work with Yan Xu.


Diffs (updated)
-----

  src/Makefile.am ddc43bd 
  src/common/type_utils.hpp 3b05751 
  src/log/coordinator.hpp 3f6fb7c 
  src/log/coordinator.cpp 6e6466f 
  src/log/log.hpp 77edc7a 
  src/log/log.cpp d057925 
  src/log/recover.hpp PRE-CREATION 
  src/log/recover.cpp PRE-CREATION 
  src/log/replica.hpp d1f5ead 
  src/log/replica.cpp 82c2157 
  src/messages/log.proto e6460ab 
  src/tests/log_tests.cpp ff5f86c 

Diff: https://reviews.apache.org/r/15802/diff/


Testing
-------

make check

bin/mesos-tests.sh --gtest_filter=LogTest.*:ReplicaTest.*:CoordinatorTest.*:LogZooKeeperTest.* --gtest_repeat=100


Thanks,

Jie Yu


Re: Review Request 15802: Catch-up Replicated Log 3: Added log recovery support.

Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15802/
-----------------------------------------------------------

(Updated Dec. 17, 2013, 1:22 a.m.)


Review request for mesos, Benjamin Hindman, Ben Mahler, Vinod Kone, and Jiang Yan Xu.


Changes
-------

Added more comments in recover.cpp to help people understand the correctness.


Bugs: MESOS-736
    https://issues.apache.org/jira/browse/MESOS-736


Repository: mesos-git


Description
-------

This is the last patch of a series of patches that implement catch-up replicated log.

Here is summary of this patch:
1) Introduced RecoverRequest/RecoverResponse.
2) Used Metadata in replica code.
3) A 2-phase empty log start-up algorithm.
4) Added a test to test two competing recover processes.

This is a joint work with Yan Xu.


Diffs (updated)
-----

  src/Makefile.am ddc43bd 
  src/common/type_utils.hpp 3b05751 
  src/log/coordinator.hpp 3f6fb7c 
  src/log/coordinator.cpp 6e6466f 
  src/log/log.hpp 77edc7a 
  src/log/log.cpp d057925 
  src/log/recover.hpp PRE-CREATION 
  src/log/recover.cpp PRE-CREATION 
  src/log/replica.hpp d1f5ead 
  src/log/replica.cpp 82c2157 
  src/messages/log.proto e6460ab 
  src/tests/log_tests.cpp ff5f86c 

Diff: https://reviews.apache.org/r/15802/diff/


Testing
-------

make check

bin/mesos-tests.sh --gtest_filter=LogTest.*:ReplicaTest.*:CoordinatorTest.*:LogZooKeeperTest.* --gtest_repeat=100


Thanks,

Jie Yu


Re: Review Request 15802: Catch-up Replicated Log 3: Added log recovery support.

Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15802/
-----------------------------------------------------------

(Updated Dec. 13, 2013, 5:37 a.m.)


Review request for mesos, Benjamin Hindman, Ben Mahler, Vinod Kone, and Jiang Yan Xu.


Changes
-------

Addressed BenH's comments.


Bugs: MESOS-736
    https://issues.apache.org/jira/browse/MESOS-736


Repository: mesos-git


Description
-------

This is the last patch of a series of patches that implement catch-up replicated log.

Here is summary of this patch:
1) Introduced RecoverRequest/RecoverResponse.
2) Used Metadata in replica code.
3) A 2-phase empty log start-up algorithm.
4) Added a test to test two competing recover processes.

This is a joint work with Yan Xu.


Diffs (updated)
-----

  src/Makefile.am 5f211a2 
  src/common/type_utils.hpp 3b05751 
  src/log/coordinator.hpp 3f6fb7c 
  src/log/coordinator.cpp 6e6466f 
  src/log/log.hpp 77edc7a 
  src/log/log.cpp d057925 
  src/log/recover.hpp PRE-CREATION 
  src/log/recover.cpp PRE-CREATION 
  src/log/replica.hpp d1f5ead 
  src/log/replica.cpp 82c2157 
  src/messages/log.proto e6460ab 
  src/tests/log_tests.cpp ff5f86c 

Diff: https://reviews.apache.org/r/15802/diff/


Testing
-------

make check

bin/mesos-tests.sh --gtest_filter=LogTest.*:ReplicaTest.*:CoordinatorTest.*:LogZooKeeperTest.* --gtest_repeat=100


Thanks,

Jie Yu


Re: Review Request 15802: Catch-up Replicated Log 3: Added log recovery support.

Posted by Jie Yu <yu...@gmail.com>.

> On Dec. 10, 2013, 6:38 a.m., Benjamin Hindman wrote:
> > src/log/log.cpp, line 817
> > <https://reviews.apache.org/r/15802/diff/5/?file=394481#file394481line817>
> >
> >     I'd prefer to keep these "timeouts" be of type Timeout not Duration (in all these methods).

One inconsistency I found in the original code is: functions like append/truncate takes a Timeout, while the Log::Writer constructor takes a Duration (I guess in this case, it does't make sense to take a Timeout).

Therefore, to be consistent, I made them all Durations. Any particular reason to use Timeout?


- Jie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15802/#review29966
-----------------------------------------------------------


On Dec. 5, 2013, 7:28 p.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/15802/
> -----------------------------------------------------------
> 
> (Updated Dec. 5, 2013, 7:28 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, Vinod Kone, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-736
>     https://issues.apache.org/jira/browse/MESOS-736
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> This is the last patch of a series of patches that implement catch-up replicated log.
> 
> Here is summary of this patch:
> 1) Introduced RecoverRequest/RecoverResponse.
> 2) Used Metadata in replica code.
> 3) A 2-phase empty log start-up algorithm.
> 4) Added a test to test two competing recover processes.
> 
> This is a joint work with Yan Xu.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am abef3d2 
>   src/java/jni/org_apache_mesos_Log.cpp 36c636d 
>   src/log/coordinator.hpp 3f6fb7c 
>   src/log/coordinator.cpp 6e6466f 
>   src/log/log.hpp 77edc7a 
>   src/log/log.cpp d057925 
>   src/log/recover.hpp PRE-CREATION 
>   src/log/recover.cpp PRE-CREATION 
>   src/log/replica.hpp d1f5ead 
>   src/log/replica.cpp 82c2157 
>   src/messages/log.proto e6460ab 
>   src/tests/log_tests.cpp ff5f86c 
> 
> Diff: https://reviews.apache.org/r/15802/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> bin/mesos-tests.sh --gtest_filter=LogTest.*:ReplicaTest.*:CoordinatorTest.*:LogZooKeeperTest.* --gtest_repeat=100
> 
> 
> Thanks,
> 
> Jie Yu
> 
>


Re: Review Request 15802: Catch-up Replicated Log 3: Added log recovery support.

Posted by Jie Yu <yu...@gmail.com>.

> On Dec. 10, 2013, 6:38 a.m., Benjamin Hindman wrote:
> > src/log/log.cpp, line 305
> > <https://reviews.apache.org/r/15802/diff/5/?file=394481#file394481line305>
> >
> >     When would we want '!strict' with the log? I see you commented on why we might not want strict for the replica when writing tests, but if we are creating a log, won't we always want to recover?
> 
> Jie Yu wrote:
>     It's purely for test. If one wants to test Log impl but does not want to create a ZooKeeper server, he cannot do it by creating two Log instances:
>     
>     Log log1(...);
>     Log log2(...);
>     
>     Because we don't know the pid of the replica in log2 when creating log1. So we have to do in this way:
>     
>     Replica replica1(path1, false)
>     
>     Log log2(2, path2, replica1.pid(), false);

Added a comment in log.hpp saying that this flag is only used for testing.

Instead of renaming it to 'recover', I would prefer keeping it consistent with that in Replica (i.e., call it 'strict').


> On Dec. 10, 2013, 6:38 a.m., Benjamin Hindman wrote:
> > src/tests/log_tests.cpp, line 1254
> > <https://reviews.apache.org/r/15802/diff/5/?file=394487#file394487line1254>
> >
> >     It's a bit too bad that this is copied code from TemporaryDirectoryTest. Any way you can see sharing this stuff?

Added a TODO.


> On Dec. 10, 2013, 6:38 a.m., Benjamin Hindman wrote:
> > src/log/recover.cpp, line 185
> > <https://reviews.apache.org/r/15802/diff/5/?file=394483#file394483line185>
> >
> >     This switch needs a lot more explanation. In particular, how do we know we've covered all of the cases? For example, what if we get a quorum of voting and empty? Why don't we do anything in that case?

Added a comment to explain the algorithm.


- Jie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15802/#review29966
-----------------------------------------------------------


On Dec. 5, 2013, 7:28 p.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/15802/
> -----------------------------------------------------------
> 
> (Updated Dec. 5, 2013, 7:28 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, Vinod Kone, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-736
>     https://issues.apache.org/jira/browse/MESOS-736
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> This is the last patch of a series of patches that implement catch-up replicated log.
> 
> Here is summary of this patch:
> 1) Introduced RecoverRequest/RecoverResponse.
> 2) Used Metadata in replica code.
> 3) A 2-phase empty log start-up algorithm.
> 4) Added a test to test two competing recover processes.
> 
> This is a joint work with Yan Xu.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am abef3d2 
>   src/java/jni/org_apache_mesos_Log.cpp 36c636d 
>   src/log/coordinator.hpp 3f6fb7c 
>   src/log/coordinator.cpp 6e6466f 
>   src/log/log.hpp 77edc7a 
>   src/log/log.cpp d057925 
>   src/log/recover.hpp PRE-CREATION 
>   src/log/recover.cpp PRE-CREATION 
>   src/log/replica.hpp d1f5ead 
>   src/log/replica.cpp 82c2157 
>   src/messages/log.proto e6460ab 
>   src/tests/log_tests.cpp ff5f86c 
> 
> Diff: https://reviews.apache.org/r/15802/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> bin/mesos-tests.sh --gtest_filter=LogTest.*:ReplicaTest.*:CoordinatorTest.*:LogZooKeeperTest.* --gtest_repeat=100
> 
> 
> Thanks,
> 
> Jie Yu
> 
>


Re: Review Request 15802: Catch-up Replicated Log 3: Added log recovery support.

Posted by Jie Yu <yu...@gmail.com>.

> On Dec. 10, 2013, 6:38 a.m., Benjamin Hindman wrote:
> > src/log/log.cpp, line 173
> > <https://reviews.apache.org/r/15802/diff/5/?file=394481#file394481line173>
> >
> >     Rather than holding on to the LogProcess*, how about we make Log::recover() return a Future<Shared<Replica>> and that's how we get the replica. That way, instead of doing process->replica-> we can just have a Shared<Replica>. Then LogProcess::finalize doesn't need to "wait" for Readers as your comment suggests above (or writers, as we'll see below). And does LogProcess::finalize really need to wait for the Shared<Replica> to be owned?

The reason is because I don't want others to be able to access the log (and the underlying log db file) after the Log instance has been destroyed.

For example:

Log* log1 = new Log("/var/libs/log");
...
delete log1;

Log* log2 = new Log("/var/libs/log");
...

When log1 is being deleted, I wanna make sure no one can write the db file '/var/libs/log' after that. If we don't wait for Replica to be owned, then, it's likely someone might still be able to read the file while others might start writing it (e.g. log2).


- Jie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15802/#review29966
-----------------------------------------------------------


On Dec. 5, 2013, 7:28 p.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/15802/
> -----------------------------------------------------------
> 
> (Updated Dec. 5, 2013, 7:28 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, Vinod Kone, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-736
>     https://issues.apache.org/jira/browse/MESOS-736
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> This is the last patch of a series of patches that implement catch-up replicated log.
> 
> Here is summary of this patch:
> 1) Introduced RecoverRequest/RecoverResponse.
> 2) Used Metadata in replica code.
> 3) A 2-phase empty log start-up algorithm.
> 4) Added a test to test two competing recover processes.
> 
> This is a joint work with Yan Xu.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am abef3d2 
>   src/java/jni/org_apache_mesos_Log.cpp 36c636d 
>   src/log/coordinator.hpp 3f6fb7c 
>   src/log/coordinator.cpp 6e6466f 
>   src/log/log.hpp 77edc7a 
>   src/log/log.cpp d057925 
>   src/log/recover.hpp PRE-CREATION 
>   src/log/recover.cpp PRE-CREATION 
>   src/log/replica.hpp d1f5ead 
>   src/log/replica.cpp 82c2157 
>   src/messages/log.proto e6460ab 
>   src/tests/log_tests.cpp ff5f86c 
> 
> Diff: https://reviews.apache.org/r/15802/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> bin/mesos-tests.sh --gtest_filter=LogTest.*:ReplicaTest.*:CoordinatorTest.*:LogZooKeeperTest.* --gtest_repeat=100
> 
> 
> Thanks,
> 
> Jie Yu
> 
>


Re: Review Request 15802: Catch-up Replicated Log 3: Added log recovery support.

Posted by Jie Yu <yu...@gmail.com>.

> On Dec. 10, 2013, 6:38 a.m., Benjamin Hindman wrote:
> > src/log/log.cpp, line 305
> > <https://reviews.apache.org/r/15802/diff/5/?file=394481#file394481line305>
> >
> >     When would we want '!strict' with the log? I see you commented on why we might not want strict for the replica when writing tests, but if we are creating a log, won't we always want to recover?

It's purely for test. If one wants to test Log impl but does not want to create a ZooKeeper server, he cannot do it by creating two Log instances:

Log log1(...);
Log log2(...);

Because we don't know the pid of the replica in log2 when creating log1. So we have to do in this way:

Replica replica1(path1, false)

Log log2(2, path2, replica1.pid(), false);


- Jie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15802/#review29966
-----------------------------------------------------------


On Dec. 5, 2013, 7:28 p.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/15802/
> -----------------------------------------------------------
> 
> (Updated Dec. 5, 2013, 7:28 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, Vinod Kone, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-736
>     https://issues.apache.org/jira/browse/MESOS-736
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> This is the last patch of a series of patches that implement catch-up replicated log.
> 
> Here is summary of this patch:
> 1) Introduced RecoverRequest/RecoverResponse.
> 2) Used Metadata in replica code.
> 3) A 2-phase empty log start-up algorithm.
> 4) Added a test to test two competing recover processes.
> 
> This is a joint work with Yan Xu.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am abef3d2 
>   src/java/jni/org_apache_mesos_Log.cpp 36c636d 
>   src/log/coordinator.hpp 3f6fb7c 
>   src/log/coordinator.cpp 6e6466f 
>   src/log/log.hpp 77edc7a 
>   src/log/log.cpp d057925 
>   src/log/recover.hpp PRE-CREATION 
>   src/log/recover.cpp PRE-CREATION 
>   src/log/replica.hpp d1f5ead 
>   src/log/replica.cpp 82c2157 
>   src/messages/log.proto e6460ab 
>   src/tests/log_tests.cpp ff5f86c 
> 
> Diff: https://reviews.apache.org/r/15802/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> bin/mesos-tests.sh --gtest_filter=LogTest.*:ReplicaTest.*:CoordinatorTest.*:LogZooKeeperTest.* --gtest_repeat=100
> 
> 
> Thanks,
> 
> Jie Yu
> 
>


Re: Review Request 15802: Catch-up Replicated Log 3: Added log recovery support.

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15802/#review29966
-----------------------------------------------------------



src/log/log.hpp
<https://reviews.apache.org/r/15802/#comment57464>

    Why s/Timeout/Duration/?



src/log/log.cpp
<https://reviews.apache.org/r/15802/#comment57466>

    Pass this into LogProcess::watch instead.



src/log/log.cpp
<https://reviews.apache.org/r/15802/#comment57644>

    Rather than holding on to the LogProcess*, how about we make Log::recover() return a Future<Shared<Replica>> and that's how we get the replica. That way, instead of doing process->replica-> we can just have a Shared<Replica>. Then LogProcess::finalize doesn't need to "wait" for Readers as your comment suggests above (or writers, as we'll see below). And does LogProcess::finalize really need to wait for the Shared<Replica> to be owned?



src/log/log.cpp
<https://reviews.apache.org/r/15802/#comment57645>

    How about we just grab 'quorum' and 'network' like above get the Shared<Replica> from Log::recover(). Again, this removes our dependency on LogProcess*. I can imagine the LogReaderProcess and LogWriterProcess constructors initiate recovery on Log::recover() and save the returned future and use that to gate anything else (and obviously the future must have been satisfied to use 'replica' since that's how we get it).



src/log/log.cpp
<https://reviews.apache.org/r/15802/#comment57465>

    s/(UPID)replica/(UPID) replica/



src/log/log.cpp
<https://reviews.apache.org/r/15802/#comment57641>

    Some comments on why you need to wait for these would be great.



src/log/log.cpp
<https://reviews.apache.org/r/15802/#comment57642>

    When would we want '!strict' with the log? I see you commented on why we might not want strict for the replica when writing tests, but if we are creating a log, won't we always want to recover?



src/log/log.cpp
<https://reviews.apache.org/r/15802/#comment57467>

    How about a CHECK(replica.unique())? Eventually this is what "release" is for correct? If yes, maybe a TODO?



src/log/log.cpp
<https://reviews.apache.org/r/15802/#comment57468>

    I'd prefer to keep these "timeouts" be of type Timeout not Duration (in all these methods).



src/log/log.cpp
<https://reviews.apache.org/r/15802/#comment57469>

    You can just do 'future.await(timeout.remaining())' here.



src/log/recover.cpp
<https://reviews.apache.org/r/15802/#comment57488>

    For a future that we don't control we should transition our code to be more robust and move to a model where we don't expect them _not_ to be discarded. Instead, let's do something like:
    
    CHECK(!future.isPending());
    
    if (!future.isReady()) {
      promise.fail("Failed to ...: " + future.isFailed() ? future.failure() : "future discarded";
      ...;
    }
    
    See examples in master/master.cpp.



src/log/recover.cpp
<https://reviews.apache.org/r/15802/#comment57503>

    Let's add helpers for Metadata::Status in common/type_utils.hpp so you can just do:
    
    LOG(INFO) << "Received ..." << response.status() << " status";
    
    This will also enable us to do: stringify(response.status())



src/log/recover.cpp
<https://reviews.apache.org/r/15802/#comment57501>

    How about just std::min? I think it will read better:
    
    if (lowestBeginPosition.isNone()) {
      lowestBeginPosition = response.begin();
    }
    
    lowestBeginPosition = std::min(lowestBeginPosition, response.begin());
    
    Also, what about defining a helper for doing "min" with options:
    
    template <typename T>
    Option<T> min(const Option<T>& left, const Option<T>& right)
    {
      if (left.isSome() && right.isSome()) {
        return std::min(left.get(), right.get());
      } else if (left.isSome()) {
        return left.get();
      } else if (right.isSome()) {
        return right.get();
      }
      return None();
    }
    
    This will make the code just:
    
    lowestBeginPosition = min(lowestBeginPosition, response.begin());



src/log/recover.cpp
<https://reviews.apache.org/r/15802/#comment57502>

    How about just std::max?



src/log/recover.cpp
<https://reviews.apache.org/r/15802/#comment57500>

    This switch needs a lot more explanation. In particular, how do we know we've covered all of the cases? For example, what if we get a quorum of voting and empty? Why don't we do anything in that case?



src/log/recover.cpp
<https://reviews.apache.org/r/15802/#comment57496>

    s/re-calculate/recalculate/



src/log/recover.cpp
<https://reviews.apache.org/r/15802/#comment57497>

    s/these/this/



src/log/recover.cpp
<https://reviews.apache.org/r/15802/#comment57492>

    s/re-gained/regained/



src/log/recover.cpp
<https://reviews.apache.org/r/15802/#comment57494>

    s/Trying to re-gain/Try to regain/



src/log/recover.cpp
<https://reviews.apache.org/r/15802/#comment57495>

    Why the delay?



src/log/replica.cpp
<https://reviews.apache.org/r/15802/#comment57462>

    s/replica info (Record::ReplicaInfo)/metadata (Record::Metadata)/



src/log/replica.cpp
<https://reviews.apache.org/r/15802/#comment57486>

    But isn't this the new code? Do we want old code to go into VOTING immediately?



src/log/replica.cpp
<https://reviews.apache.org/r/15802/#comment57463>

    s/replica info/metadata/



src/log/replica.cpp
<https://reviews.apache.org/r/15802/#comment57475>

    Why _metadata instead of metadata_? Also, any reason not to start with '= metadata' and then only do 'set_status'?



src/log/replica.cpp
<https://reviews.apache.org/r/15802/#comment57476>

    Ditto comments above.



src/log/replica.cpp
<https://reviews.apache.org/r/15802/#comment57470>

    What about only having strict apply when the log is empty? As in, if the log is empty and strict is false then just put the replica in voting, otherwise, always stay in non-voting until updated (i.e., via log::recover). I'd like to simplify these semantics with respect to the external log::recover call.



src/log/replica.cpp
<https://reviews.apache.org/r/15802/#comment57471>

    Please put '=' on previous line.



src/tests/log_tests.cpp
<https://reviews.apache.org/r/15802/#comment57472>

    You can only run these tests if Mesos is built with Java. For that reason most ZK tests have been put in a separate file and conditionally included in src/Makefile.am. Some tests might have #define'd included tests too. Take a look in src/tests/*



src/tests/log_tests.cpp
<https://reviews.apache.org/r/15802/#comment57473>

    This should be LOG, then you don't need to worry about checking 'tests::flags.verbose' (unless I'm missing something?).



src/tests/log_tests.cpp
<https://reviews.apache.org/r/15802/#comment57474>

    It's a bit too bad that this is copied code from TemporaryDirectoryTest. Any way you can see sharing this stuff?


- Benjamin Hindman


On Dec. 5, 2013, 7:28 p.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/15802/
> -----------------------------------------------------------
> 
> (Updated Dec. 5, 2013, 7:28 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, Vinod Kone, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-736
>     https://issues.apache.org/jira/browse/MESOS-736
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> This is the last patch of a series of patches that implement catch-up replicated log.
> 
> Here is summary of this patch:
> 1) Introduced RecoverRequest/RecoverResponse.
> 2) Used Metadata in replica code.
> 3) A 2-phase empty log start-up algorithm.
> 4) Added a test to test two competing recover processes.
> 
> This is a joint work with Yan Xu.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am abef3d2 
>   src/java/jni/org_apache_mesos_Log.cpp 36c636d 
>   src/log/coordinator.hpp 3f6fb7c 
>   src/log/coordinator.cpp 6e6466f 
>   src/log/log.hpp 77edc7a 
>   src/log/log.cpp d057925 
>   src/log/recover.hpp PRE-CREATION 
>   src/log/recover.cpp PRE-CREATION 
>   src/log/replica.hpp d1f5ead 
>   src/log/replica.cpp 82c2157 
>   src/messages/log.proto e6460ab 
>   src/tests/log_tests.cpp ff5f86c 
> 
> Diff: https://reviews.apache.org/r/15802/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> bin/mesos-tests.sh --gtest_filter=LogTest.*:ReplicaTest.*:CoordinatorTest.*:LogZooKeeperTest.* --gtest_repeat=100
> 
> 
> Thanks,
> 
> Jie Yu
> 
>


Re: Review Request 15802: Catch-up Replicated Log 3: Added log recovery support.

Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15802/
-----------------------------------------------------------

(Updated Dec. 5, 2013, 7:28 p.m.)


Review request for mesos, Benjamin Hindman, Ben Mahler, Vinod Kone, and Jiang Yan Xu.


Changes
-------

BenH's suggestions (discussed offline).

Introduced LogReaderProcess and LogWriterProcess.


Bugs: MESOS-736
    https://issues.apache.org/jira/browse/MESOS-736


Repository: mesos-git


Description
-------

This is the last patch of a series of patches that implement catch-up replicated log.

Here is summary of this patch:
1) Introduced RecoverRequest/RecoverResponse.
2) Used Metadata in replica code.
3) A 2-phase empty log start-up algorithm.
4) Added a test to test two competing recover processes.

This is a joint work with Yan Xu.


Diffs (updated)
-----

  src/Makefile.am abef3d2 
  src/java/jni/org_apache_mesos_Log.cpp 36c636d 
  src/log/coordinator.hpp 3f6fb7c 
  src/log/coordinator.cpp 6e6466f 
  src/log/log.hpp 77edc7a 
  src/log/log.cpp d057925 
  src/log/recover.hpp PRE-CREATION 
  src/log/recover.cpp PRE-CREATION 
  src/log/replica.hpp d1f5ead 
  src/log/replica.cpp 82c2157 
  src/messages/log.proto e6460ab 
  src/tests/log_tests.cpp ff5f86c 

Diff: https://reviews.apache.org/r/15802/diff/


Testing
-------

make check

bin/mesos-tests.sh --gtest_filter=LogTest.*:ReplicaTest.*:CoordinatorTest.*:LogZooKeeperTest.* --gtest_repeat=100


Thanks,

Jie Yu


Re: Review Request 15802: Catch-up Replicated Log 3: Added log recovery support.

Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15802/
-----------------------------------------------------------

(Updated Dec. 4, 2013, 8:02 p.m.)


Review request for mesos, Benjamin Hindman, Ben Mahler, Vinod Kone, and Jiang Yan Xu.


Changes
-------

OK. This is much better. Killed 'gate' and '_gate'. Provided the level of indirection in 'recover' and '_recover'.


Bugs: MESOS-736
    https://issues.apache.org/jira/browse/MESOS-736


Repository: mesos-git


Description
-------

This is the last patch of a series of patches that implement catch-up replicated log.

Here is summary of this patch:
1) Introduced RecoverRequest/RecoverResponse.
2) Used Metadata in replica code.
3) A 2-phase empty log start-up algorithm.
4) Added a test to test two competing recover processes.

This is a joint work with Yan Xu.


Diffs (updated)
-----

  src/Makefile.am abef3d2 
  src/java/jni/org_apache_mesos_Log.cpp 36c636d 
  src/log/coordinator.hpp 3f6fb7c 
  src/log/coordinator.cpp 6e6466f 
  src/log/log.hpp 77edc7a 
  src/log/log.cpp d057925 
  src/log/recover.hpp PRE-CREATION 
  src/log/recover.cpp PRE-CREATION 
  src/log/replica.hpp d1f5ead 
  src/log/replica.cpp 82c2157 
  src/messages/log.proto e6460ab 
  src/tests/log_tests.cpp ff5f86c 

Diff: https://reviews.apache.org/r/15802/diff/


Testing
-------

make check

bin/mesos-tests.sh --gtest_filter=LogTest.*:ReplicaTest.*:CoordinatorTest.*:LogZooKeeperTest.* --gtest_repeat=100


Thanks,

Jie Yu


Re: Review Request 15802: Catch-up Replicated Log 3: Added log recovery support.

Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15802/
-----------------------------------------------------------

(Updated Dec. 4, 2013, 1:16 a.m.)


Review request for mesos, Benjamin Hindman, Ben Mahler, Vinod Kone, and Jiang Yan Xu.


Changes
-------

BenH's comments.

Tests:
make check

bin/mesos-tests.sh --gtest_filter=LogTest.*:ReplicaTest.*:CoordinatorTest.*:LogZooKeeperTest.* --gtest_repeat=100


Bugs: MESOS-736
    https://issues.apache.org/jira/browse/MESOS-736


Repository: mesos-git


Description
-------

This is the last patch of a series of patches that implement catch-up replicated log.

Here is summary of this patch:
1) Introduced RecoverRequest/RecoverResponse.
2) Used Metadata in replica code.
3) A 2-phase empty log start-up algorithm.
4) Added a test to test two competing recover processes.

This is a joint work with Yan Xu.


Diffs (updated)
-----

  src/Makefile.am abef3d2 
  src/java/jni/org_apache_mesos_Log.cpp 36c636d 
  src/log/coordinator.hpp 3f6fb7c 
  src/log/coordinator.cpp 6e6466f 
  src/log/log.hpp 77edc7a 
  src/log/log.cpp d057925 
  src/log/recover.hpp PRE-CREATION 
  src/log/recover.cpp PRE-CREATION 
  src/log/replica.hpp d1f5ead 
  src/log/replica.cpp 82c2157 
  src/messages/log.proto e6460ab 
  src/tests/log_tests.cpp ff5f86c 

Diff: https://reviews.apache.org/r/15802/diff/


Testing
-------

make check

bin/mesos-tests.sh --gtest_filter=LogTest.*:ReplicaTest.*:CoordinatorTest.*:LogZooKeeperTest.* --gtest_repeat=100


Thanks,

Jie Yu


Re: Review Request 15802: Catch-up Replicated Log 3: Added log recovery support.

Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15802/
-----------------------------------------------------------

(Updated Nov. 25, 2013, 5:55 a.m.)


Review request for mesos, Benjamin Hindman, Ben Mahler, Vinod Kone, and Jiang Yan Xu.


Changes
-------

Rebased on (Log 2: https://reviews.apache.org/r/14902/).


Summary (updated)
-----------------

Catch-up Replicated Log 3: Added log recovery support.


Bugs: MESOS-736
    https://issues.apache.org/jira/browse/MESOS-736


Repository: mesos-git


Description
-------

This is the last patch of a series of patches that implement catch-up replicated log.

Here is summary of this patch:
1) Introduced RecoverRequest/RecoverResponse.
2) Used Metadata in replica code.
3) A 2-phase empty log start-up algorithm.
4) Added a test to test two competing recover processes.

This is a joint work with Yan Xu.


Diffs (updated)
-----

  src/Makefile.am feda34b 
  src/log/log.hpp 77edc7a 
  src/log/recover.hpp PRE-CREATION 
  src/log/recover.cpp PRE-CREATION 
  src/log/replica.hpp d1f5ead 
  src/log/replica.cpp 59a6ff3 
  src/messages/log.proto 3d5859f 
  src/tests/log_tests.cpp ff5f86c 

Diff: https://reviews.apache.org/r/15802/diff/


Testing
-------

make check

bin/mesos-tests.sh --gtest_filter=LogTest.*:ReplicaTest.*:CoordinatorTest.*:LogZooKeeperTest.* --gtest_repeat=100


Thanks,

Jie Yu