You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Adam B <ad...@mesosphere.io> on 2014/01/11 02:51:06 UTC

Re: Review Request 16724: Added completed frameworks/tasks to slave re-registration.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16724/
-----------------------------------------------------------

(Updated Jan. 10, 2014, 5:51 p.m.)


Review request for mesos, Benjamin Hindman and Niklas Nielsen.


Changes
-------

Updated based on review feedback; added mesos public group to reviewers.


Summary (updated)
-----------------

Added completed frameworks/tasks to slave re-registration.


Bugs: MESOS-767
    https://issues.apache.org/jira/browse/MESOS-767


Repository: mesos-git


Description (updated)
-------

Added completed frameworks/tasks to slave re-registration.
Fixes MESOS-767.

Additional issues discovered during investigation (jiras to-be-filed):
- Framework.id is set, but Framework.FrameworkInfo.id is not.
BenH left a TODO in master.hpp suggesting we move to 'info.id
- A framework's last task may remain in the completedExecutor's
terminatedTasks instead of moving to completedTasks.
- Completed frameworks/executors/tasks are stored in circular buffers,
and these may overflow in different orders on different slaves.


Diffs (updated)
-----

  src/master/master.hpp 95b9cec 
  src/master/master.cpp 38c5532 
  src/messages/messages.proto 1f264d5 
  src/slave/slave.cpp 396293b 

Diff: https://reviews.apache.org/r/16724/diff/


Testing
-------

make check; manually failed-over a master, watched the slave reregister its completed frameworks, web UI shows completed tasks and stdout/stderr.


Thanks,

Adam B


Re: Review Request 16724: Added completed frameworks/tasks to slave re-registration.

Posted by Adam B <ad...@mesosphere.io>.

> On Jan. 29, 2014, 4:58 p.m., Vinod Kone wrote:
> > src/slave/slave.cpp, line 712
> > <https://reviews.apache.org/r/16724/diff/4/?file=454079#file454079line712>
> >
> >     why the different log level?
> >     
> >     s/2/1/?

I figured that the different log levels could be used for different verbosity levels, such that in this case VLOG(1) gives one statement per framework, VLOG(2) gives one per executor, and VLOG(3) gives one statement per task. Seeing that all of mesos only has 5 VLOG(2)'s and 1 VLOG(3), all in sched.cpp, we're really not using VLOG levels much yet.
I'm happy to change these all over to VLOG(1) if desired, but I would like to open up a dialogue about our debug/vlog log-level policy.


> On Jan. 29, 2014, 4:58 p.m., Vinod Kone wrote:
> > src/master/master.hpp, line 144
> > <https://reviews.apache.org/r/16724/diff/4/?file=454076#file454076line144>
> >
> >     we tend to not add "_" suffixes to params in declarations.

Interesting. Seems a little weird to me to have the declaration without '_' and the implementation with it, but it does make the header look cleaner. Fixed.


- Adam


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16724/#review33187
-----------------------------------------------------------


On Jan. 29, 2014, 4:17 p.m., Adam B wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16724/
> -----------------------------------------------------------
> 
> (Updated Jan. 29, 2014, 4:17 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, Niklas Nielsen, and Vinod Kone.
> 
> 
> Bugs: MESOS-767
>     https://issues.apache.org/jira/browse/MESOS-767
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Added completed frameworks/tasks to slave re-registration.
> Fixes MESOS-767.
> 
> Additional issues discovered during investigation:
> - MESOS-905: Remove Framework.id in favor of FrameworkInfo.id
> - MESOS-906: Last task in Completed Framework never graduates from
> terminatedTasks to completedTasks.
> - Completed frameworks/executors/tasks are stored in circular buffers,
> and these may overflow in different orders on different slaves. 
> BenH proposes an archive to replace these circular buffers.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 7649737 
>   src/master/master.cpp 77872ec 
>   src/messages/messages.proto 922a8c4 
>   src/slave/slave.cpp 2d21e16 
> 
> Diff: https://reviews.apache.org/r/16724/diff/
> 
> 
> Testing
> -------
> 
> make check; manually failed-over a master, watched the slave reregister its completed frameworks, web UI shows completed tasks and stdout/stderr.
> 
> 
> Thanks,
> 
> Adam B
> 
>


Re: Review Request 16724: Added completed frameworks/tasks to slave re-registration.

Posted by Adam B <ad...@mesosphere.io>.

> On Jan. 29, 2014, 4:58 p.m., Vinod Kone wrote:
> > src/slave/slave.cpp, line 712
> > <https://reviews.apache.org/r/16724/diff/4/?file=454079#file454079line712>
> >
> >     why the different log level?
> >     
> >     s/2/1/?
> 
> Adam B wrote:
>     I figured that the different log levels could be used for different verbosity levels, such that in this case VLOG(1) gives one statement per framework, VLOG(2) gives one per executor, and VLOG(3) gives one statement per task. Seeing that all of mesos only has 5 VLOG(2)'s and 1 VLOG(3), all in sched.cpp, we're really not using VLOG levels much yet.
>     I'm happy to change these all over to VLOG(1) if desired, but I would like to open up a dialogue about our debug/vlog log-level policy.

Changed the VLOG(3)'s to VLOG(2)'s, since they're more verbose than necessary for a VLOG(1), but could still be useful to someone debugging deep into this code. Seems like there's no real policy, so I'll just stick to two VLOG levels for now.


- Adam


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16724/#review33187
-----------------------------------------------------------


On Feb. 27, 2014, 12:28 p.m., Adam B wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16724/
> -----------------------------------------------------------
> 
> (Updated Feb. 27, 2014, 12:28 p.m.)
> 
> 
> Review request for mesos and Benjamin Hindman.
> 
> 
> Bugs: MESOS-767
>     https://issues.apache.org/jira/browse/MESOS-767
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Added completed frameworks/tasks to slave re-registration.
> Fixes MESOS-767.
> 
> Additional issues discovered during investigation:
> - MESOS-905: Remove Framework.id in favor of FrameworkInfo.id
> - MESOS-906: Last task in Completed Framework never graduates from
> terminatedTasks to completedTasks.
> - Completed frameworks/executors/tasks are stored in circular buffers,
> and these may overflow in different orders on different slaves. 
> BenH proposes an archive to replace these circular buffers.
> 
> 
> Diffs
> -----
> 
>   include/mesos/scheduler.hpp 2e4707e 
>   src/master/master.hpp 7649737 
>   src/master/master.cpp 77872ec 
>   src/messages/messages.proto 922a8c4 
>   src/slave/slave.cpp 2d21e16 
>   src/tests/fault_tolerance_tests.cpp 60e06cc 
>   src/tests/mesos.hpp d7bdaee 
> 
> Diff: https://reviews.apache.org/r/16724/diff/
> 
> 
> Testing
> -------
> 
> make check; manually failed-over a master, watched the slave reregister its completed frameworks, web UI shows completed tasks and stdout/stderr.
> Added a new unit/integration test to verify the expected behavior.
> 
> 
> Thanks,
> 
> Adam B
> 
>


Re: Review Request 16724: Added completed frameworks/tasks to slave re-registration.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16724/#review33187
-----------------------------------------------------------

Ship it!



src/master/master.hpp
<https://reviews.apache.org/r/16724/#comment62519>

    we tend to not add "_" suffixes to params in declarations.



src/master/master.hpp
<https://reviews.apache.org/r/16724/#comment62520>

    ditto



src/master/master.hpp
<https://reviews.apache.org/r/16724/#comment62521>

    ditto



src/master/master.cpp
<https://reviews.apache.org/r/16724/#comment62525>

    How about
    
    LOG(INFO) << "Re-add completed framework " 
              << completedFramework_.framework_info().id()
              << " from slave " << slave-id << " ("
              << slave->info.hostname() << ")" ;



src/master/master.cpp
<https://reviews.apache.org/r/16724/#comment62526>

    s/Registering/Re-adding/



src/master/master.cpp
<https://reviews.apache.org/r/16724/#comment62528>

    s/Registering/Re-adding/



src/slave/slave.cpp
<https://reviews.apache.org/r/16724/#comment62522>

    why the different log level?
    
    s/2/1/?



src/slave/slave.cpp
<https://reviews.apache.org/r/16724/#comment62523>

    s/3/1/



src/slave/slave.cpp
<https://reviews.apache.org/r/16724/#comment62524>

    s/3/1/


- Vinod Kone


On Jan. 30, 2014, 12:17 a.m., Adam B wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16724/
> -----------------------------------------------------------
> 
> (Updated Jan. 30, 2014, 12:17 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, Niklas Nielsen, and Vinod Kone.
> 
> 
> Bugs: MESOS-767
>     https://issues.apache.org/jira/browse/MESOS-767
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Added completed frameworks/tasks to slave re-registration.
> Fixes MESOS-767.
> 
> Additional issues discovered during investigation:
> - MESOS-905: Remove Framework.id in favor of FrameworkInfo.id
> - MESOS-906: Last task in Completed Framework never graduates from
> terminatedTasks to completedTasks.
> - Completed frameworks/executors/tasks are stored in circular buffers,
> and these may overflow in different orders on different slaves. 
> BenH proposes an archive to replace these circular buffers.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 7649737 
>   src/master/master.cpp 77872ec 
>   src/messages/messages.proto 922a8c4 
>   src/slave/slave.cpp 2d21e16 
> 
> Diff: https://reviews.apache.org/r/16724/diff/
> 
> 
> Testing
> -------
> 
> make check; manually failed-over a master, watched the slave reregister its completed frameworks, web UI shows completed tasks and stdout/stderr.
> 
> 
> Thanks,
> 
> Adam B
> 
>


Re: Review Request 16724: Added completed frameworks/tasks to slave re-registration.

Posted by Adam B <ad...@mesosphere.io>.

> On Feb. 18, 2014, 2:07 p.m., Vinod Kone wrote:
> > src/tests/fault_tolerance_tests.cpp, line 712
> > <https://reviews.apache.org/r/16724/diff/6/?file=492072#file492072line712>
> >
> >     Why do you need to AWAIT on this?

Copy/paste coding, sanity-checking overkill as I was trying to get this stuff figured out. Removed.


> On Feb. 18, 2014, 2:07 p.m., Vinod Kone wrote:
> > src/tests/mesos.hpp, lines 239-252
> > <https://reviews.apache.org/r/16724/diff/6/?file=492073#file492073line239>
> >
> >     Why not augment createTask() above?
> >     
> >     Also, it seems a bit weird to use DEFAULT_EXECUTOR_INFO as Task.ExecutorInfo.

If I use createTask() as is, I get "Failed to launch executor <foo> of framework <bar> because it is unknown to the isolator".
If I add "task.mutable_executor()->CopyFrom(DEFAULT_EXECUTOR_INFO);" after calling createTask, I get TASK_LOST ("TaskInfo must have either an 'executor' or a 'command'").
>From the declaration of TaskInfo in mesos.proto, "Either ExecutorInfo or CommandInfo should be set" (not both), so I think it makes sense to have createTaskWithCommand() and createTaskWithExecutor(). Since I'm just using the DEFAULT_EXECUTOR_INFO, I have renamed my function createTaskWithDefaultExecutor(), but we could generalize this to take in an (optional) ExecutorInfo.


> On Feb. 18, 2014, 2:07 p.m., Vinod Kone wrote:
> > src/tests/fault_tolerance_tests.cpp, lines 788-789
> > <https://reviews.apache.org/r/16724/diff/6/?file=492072#file492072line788>
> >
> >     ditto.

Turns out I don't need either of these. Removed.


> On Feb. 18, 2014, 2:07 p.m., Vinod Kone wrote:
> > src/tests/fault_tolerance_tests.cpp, lines 786-787
> > <https://reviews.apache.org/r/16724/diff/6/?file=492072#file492072line786>
> >
> >     Do you need to wait on both?
> >     
> >     If no, kill one of them.

Turns out I need both of these.
Executor::shutdown() is called, and ought to be expected, else we get "Uninteresting mock function call".
Slave::executorTerminated is the call I need to wait/settle on, since it is what actually calls removeFramework to put the framework in completedFrameworks.


> On Feb. 18, 2014, 2:07 p.m., Vinod Kone wrote:
> > src/tests/fault_tolerance_tests.cpp, line 666
> > <https://reviews.apache.org/r/16724/diff/6/?file=492072#file492072line666>
> >
> >     const string&

Changed to const string (no ref), since std::string.substr() returns a new string, and we shouldn't return a reference to a local variable.


- Adam


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16724/#review34767
-----------------------------------------------------------


On Feb. 17, 2014, 4:23 p.m., Adam B wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16724/
> -----------------------------------------------------------
> 
> (Updated Feb. 17, 2014, 4:23 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, Niklas Nielsen, and Vinod Kone.
> 
> 
> Bugs: MESOS-767
>     https://issues.apache.org/jira/browse/MESOS-767
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Added completed frameworks/tasks to slave re-registration.
> Fixes MESOS-767.
> 
> Additional issues discovered during investigation:
> - MESOS-905: Remove Framework.id in favor of FrameworkInfo.id
> - MESOS-906: Last task in Completed Framework never graduates from
> terminatedTasks to completedTasks.
> - Completed frameworks/executors/tasks are stored in circular buffers,
> and these may overflow in different orders on different slaves. 
> BenH proposes an archive to replace these circular buffers.
> 
> 
> Diffs
> -----
> 
>   include/mesos/scheduler.hpp 2e4707e 
>   src/master/master.hpp 7649737 
>   src/master/master.cpp 77872ec 
>   src/messages/messages.proto 922a8c4 
>   src/slave/slave.cpp 2d21e16 
>   src/tests/fault_tolerance_tests.cpp 60e06cc 
>   src/tests/mesos.hpp d7bdaee 
> 
> Diff: https://reviews.apache.org/r/16724/diff/
> 
> 
> Testing
> -------
> 
> make check; manually failed-over a master, watched the slave reregister its completed frameworks, web UI shows completed tasks and stdout/stderr.
> Added a new unit/integration test to verify the expected behavior.
> 
> 
> Thanks,
> 
> Adam B
> 
>


Re: Review Request 16724: Added completed frameworks/tasks to slave re-registration.

Posted by Adam B <ad...@mesosphere.io>.

> On Feb. 18, 2014, 2:07 p.m., Vinod Kone wrote:
> > src/tests/mesos.hpp, lines 239-252
> > <https://reviews.apache.org/r/16724/diff/6/?file=492073#file492073line239>
> >
> >     Why not augment createTask() above?
> >     
> >     Also, it seems a bit weird to use DEFAULT_EXECUTOR_INFO as Task.ExecutorInfo.
> 
> Adam B wrote:
>     If I use createTask() as is, I get "Failed to launch executor <foo> of framework <bar> because it is unknown to the isolator".
>     If I add "task.mutable_executor()->CopyFrom(DEFAULT_EXECUTOR_INFO);" after calling createTask, I get TASK_LOST ("TaskInfo must have either an 'executor' or a 'command'").
>     From the declaration of TaskInfo in mesos.proto, "Either ExecutorInfo or CommandInfo should be set" (not both), so I think it makes sense to have createTaskWithCommand() and createTaskWithExecutor(). Since I'm just using the DEFAULT_EXECUTOR_INFO, I have renamed my function createTaskWithDefaultExecutor(), but we could generalize this to take in an (optional) ExecutorInfo.

This code was also moved into fault_tolerance_tests.cpp. If it becomes valuable to other tests, then we can revisit the design of createTask() to include an ExecutorInfo or CommandInfo.


- Adam


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16724/#review34767
-----------------------------------------------------------


On Feb. 18, 2014, 6:07 p.m., Adam B wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16724/
> -----------------------------------------------------------
> 
> (Updated Feb. 18, 2014, 6:07 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, Niklas Nielsen, and Vinod Kone.
> 
> 
> Bugs: MESOS-767
>     https://issues.apache.org/jira/browse/MESOS-767
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Added completed frameworks/tasks to slave re-registration.
> Fixes MESOS-767.
> 
> Additional issues discovered during investigation:
> - MESOS-905: Remove Framework.id in favor of FrameworkInfo.id
> - MESOS-906: Last task in Completed Framework never graduates from
> terminatedTasks to completedTasks.
> - Completed frameworks/executors/tasks are stored in circular buffers,
> and these may overflow in different orders on different slaves. 
> BenH proposes an archive to replace these circular buffers.
> 
> 
> Diffs
> -----
> 
>   include/mesos/scheduler.hpp 2e4707e 
>   src/master/master.hpp 7649737 
>   src/master/master.cpp 77872ec 
>   src/messages/messages.proto 922a8c4 
>   src/slave/slave.cpp 2d21e16 
>   src/tests/fault_tolerance_tests.cpp 60e06cc 
>   src/tests/mesos.hpp d7bdaee 
> 
> Diff: https://reviews.apache.org/r/16724/diff/
> 
> 
> Testing
> -------
> 
> make check; manually failed-over a master, watched the slave reregister its completed frameworks, web UI shows completed tasks and stdout/stderr.
> Added a new unit/integration test to verify the expected behavior.
> 
> 
> Thanks,
> 
> Adam B
> 
>


Re: Review Request 16724: Added completed frameworks/tasks to slave re-registration.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16724/#review34767
-----------------------------------------------------------



src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/16724/#comment65021>

    kill newline.



src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/16724/#comment65023>

    const string&



src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/16724/#comment65024>

    s/+/ + /



src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/16724/#comment65025>

    new line.



src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/16724/#comment65067>

    Kill this.



src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/16724/#comment65059>

    Why do you need to AWAIT on this?



src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/16724/#comment65068>

    Kill this.



src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/16724/#comment65060>

    Why AWAIT on this?



src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/16724/#comment65063>

    s/Framework/framework/



src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/16724/#comment65064>

    s/execShutdown/shutdown/



src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/16724/#comment65076>

    Do you need to wait on both?
    
    If no, kill one of them.



src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/16724/#comment65077>

    ditto.



src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/16724/#comment65078>

    s/Framework/framework/



src/tests/mesos.hpp
<https://reviews.apache.org/r/16724/#comment65065>

    Why not augment createTask() above?
    
    Also, it seems a bit weird to use DEFAULT_EXECUTOR_INFO as Task.ExecutorInfo.


- Vinod Kone


On Feb. 18, 2014, 12:23 a.m., Adam B wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16724/
> -----------------------------------------------------------
> 
> (Updated Feb. 18, 2014, 12:23 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, Niklas Nielsen, and Vinod Kone.
> 
> 
> Bugs: MESOS-767
>     https://issues.apache.org/jira/browse/MESOS-767
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Added completed frameworks/tasks to slave re-registration.
> Fixes MESOS-767.
> 
> Additional issues discovered during investigation:
> - MESOS-905: Remove Framework.id in favor of FrameworkInfo.id
> - MESOS-906: Last task in Completed Framework never graduates from
> terminatedTasks to completedTasks.
> - Completed frameworks/executors/tasks are stored in circular buffers,
> and these may overflow in different orders on different slaves. 
> BenH proposes an archive to replace these circular buffers.
> 
> 
> Diffs
> -----
> 
>   include/mesos/scheduler.hpp 2e4707e 
>   src/master/master.hpp 7649737 
>   src/master/master.cpp 77872ec 
>   src/messages/messages.proto 922a8c4 
>   src/slave/slave.cpp 2d21e16 
>   src/tests/fault_tolerance_tests.cpp 60e06cc 
>   src/tests/mesos.hpp d7bdaee 
> 
> Diff: https://reviews.apache.org/r/16724/diff/
> 
> 
> Testing
> -------
> 
> make check; manually failed-over a master, watched the slave reregister its completed frameworks, web UI shows completed tasks and stdout/stderr.
> Added a new unit/integration test to verify the expected behavior.
> 
> 
> Thanks,
> 
> Adam B
> 
>


Re: Review Request 16724: Added completed frameworks/tasks to slave re-registration.

Posted by Adam B <ad...@mesosphere.io>.

> On Feb. 18, 2014, 8:12 a.m., Niklas Nielsen wrote:
> > src/master/master.cpp, line 2889
> > <https://reviews.apache.org/r/16724/diff/6/?file=492069#file492069line2889>
> >
> >     Where does completedFramework clash (since the '_')?

I guess completedFramework doesn't actually clash here, but I thought it would be nice to iterate foreach completedFramework_ in completedFrameworks_ (clash), especially since we then pass it to readdCompletedFramework which has completedFramework_ as its parameter, so that readdCompletedFramework can iterate foreach completedFramework (clash) in completedFrameworks.
I can remove the '_' from this completedFramework_ if you want, but I feel like it reads better this way.


> On Feb. 18, 2014, 8:12 a.m., Niklas Nielsen wrote:
> > src/messages/messages.proto, line 369
> > <https://reviews.apache.org/r/16724/diff/6/?file=492070#file492070line369>
> >
> >     Is this comment obsolete?

No, BenH wanted to actually move this into a separate file in a new mesos.internal.archive package, but that can be done when the rest of the Archive work comes along.


> On Feb. 18, 2014, 8:12 a.m., Niklas Nielsen wrote:
> > src/tests/mesos.hpp, lines 247-249
> > <https://reviews.apache.org/r/16724/diff/6/?file=492073#file492073line247>
> >
> >     Any need to MergeFrom? If not, then use CopyFrom :)

Not that I know of. Fixed the MergeFrom's in createTask as well.


> On Feb. 18, 2014, 8:12 a.m., Niklas Nielsen wrote:
> > src/tests/fault_tolerance_tests.cpp, line 666
> > <https://reviews.apache.org/r/16724/diff/6/?file=492072#file492072line666>
> >
> >     s/Json/JSON/?

I could find nothing in the Mesos/Google style guides about naming conventions with acronyms, so I went with a previous rule I've used. 
A quick google search yields the following recommendations:
>From Microsoft: http://msdn.microsoft.com/en-us/library/vstudio/ms229043(v=vs.100).aspx
"Do capitalize only the first character of acronyms with three or more characters, except the first word of a camel-cased identifier."
>From Geosoft (who?): http://geosoft.no/development/cppstyle.html
"9. Abbreviations and acronyms must not be uppercase when used as name [4]."
which actually cites Todd Hoff: http://www.possibility.com/Cpp/CppCodingStandard.html#noabbrev
"Take for example NetworkABCKey. Notice how the C from ABC and K from key are confused. Some people don't mind this and others just hate it..."

Looking in the Mesos code, I can find examples of both capitalization styles:
3rdparty/libprocess/3rdparty/stout/tests/json_tests.cpp:13:TEST(JsonTest, BinaryData)
3rdparty/libprocess/src/process.cpp:2931:      struct JSONVisitor : EventVisitor

I can go either way, but I find it easier to read when the acronym doesn't bleed into the next capitalized letter.


> On Feb. 18, 2014, 8:12 a.m., Niklas Nielsen wrote:
> > src/tests/mesos.hpp, lines 239-252
> > <https://reviews.apache.org/r/16724/diff/6/?file=492073#file492073line239>
> >
> >     You only use helper once, so how about moving it to fault_tolerance_tests.cpp (like findJsonValue)?

Sure. Moved it, and renamed to createTaskWithDefaultExecutor (since it doesn't actually take an Executor parameter).
If somebody else finds themselves needing it in another test, they can move it back to mesos.hpp.


- Adam


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16724/#review34675
-----------------------------------------------------------


On Feb. 17, 2014, 4:23 p.m., Adam B wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16724/
> -----------------------------------------------------------
> 
> (Updated Feb. 17, 2014, 4:23 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, Niklas Nielsen, and Vinod Kone.
> 
> 
> Bugs: MESOS-767
>     https://issues.apache.org/jira/browse/MESOS-767
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Added completed frameworks/tasks to slave re-registration.
> Fixes MESOS-767.
> 
> Additional issues discovered during investigation:
> - MESOS-905: Remove Framework.id in favor of FrameworkInfo.id
> - MESOS-906: Last task in Completed Framework never graduates from
> terminatedTasks to completedTasks.
> - Completed frameworks/executors/tasks are stored in circular buffers,
> and these may overflow in different orders on different slaves. 
> BenH proposes an archive to replace these circular buffers.
> 
> 
> Diffs
> -----
> 
>   include/mesos/scheduler.hpp 2e4707e 
>   src/master/master.hpp 7649737 
>   src/master/master.cpp 77872ec 
>   src/messages/messages.proto 922a8c4 
>   src/slave/slave.cpp 2d21e16 
>   src/tests/fault_tolerance_tests.cpp 60e06cc 
>   src/tests/mesos.hpp d7bdaee 
> 
> Diff: https://reviews.apache.org/r/16724/diff/
> 
> 
> Testing
> -------
> 
> make check; manually failed-over a master, watched the slave reregister its completed frameworks, web UI shows completed tasks and stdout/stderr.
> Added a new unit/integration test to verify the expected behavior.
> 
> 
> Thanks,
> 
> Adam B
> 
>


Re: Review Request 16724: Added completed frameworks/tasks to slave re-registration.

Posted by Niklas Nielsen <ni...@qni.dk>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16724/#review34675
-----------------------------------------------------------

Ship it!



src/master/master.cpp
<https://reviews.apache.org/r/16724/#comment64893>

    Where does completedFramework clash (since the '_')?



src/messages/messages.proto
<https://reviews.apache.org/r/16724/#comment64894>

    Is this comment obsolete?



src/slave/slave.cpp
<https://reviews.apache.org/r/16724/#comment64892>

    Fix indentation :)



src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/16724/#comment64891>

    Would you mind adding a period at the end of you test comments (for the entire test)? :)



src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/16724/#comment64898>

    s/Json/JSON/?



src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/16724/#comment64901>

    Can you scan this file and wrap at 80?



src/tests/mesos.hpp
<https://reviews.apache.org/r/16724/#comment64896>

    You only use helper once, so how about moving it to fault_tolerance_tests.cpp (like findJsonValue)?



src/tests/mesos.hpp
<https://reviews.apache.org/r/16724/#comment64903>

    Any need to MergeFrom? If not, then use CopyFrom :)


- Niklas Nielsen


On Feb. 17, 2014, 4:23 p.m., Adam B wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16724/
> -----------------------------------------------------------
> 
> (Updated Feb. 17, 2014, 4:23 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, Niklas Nielsen, and Vinod Kone.
> 
> 
> Bugs: MESOS-767
>     https://issues.apache.org/jira/browse/MESOS-767
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Added completed frameworks/tasks to slave re-registration.
> Fixes MESOS-767.
> 
> Additional issues discovered during investigation:
> - MESOS-905: Remove Framework.id in favor of FrameworkInfo.id
> - MESOS-906: Last task in Completed Framework never graduates from
> terminatedTasks to completedTasks.
> - Completed frameworks/executors/tasks are stored in circular buffers,
> and these may overflow in different orders on different slaves. 
> BenH proposes an archive to replace these circular buffers.
> 
> 
> Diffs
> -----
> 
>   include/mesos/scheduler.hpp 2e4707e 
>   src/master/master.hpp 7649737 
>   src/master/master.cpp 77872ec 
>   src/messages/messages.proto 922a8c4 
>   src/slave/slave.cpp 2d21e16 
>   src/tests/fault_tolerance_tests.cpp 60e06cc 
>   src/tests/mesos.hpp d7bdaee 
> 
> Diff: https://reviews.apache.org/r/16724/diff/
> 
> 
> Testing
> -------
> 
> make check; manually failed-over a master, watched the slave reregister its completed frameworks, web UI shows completed tasks and stdout/stderr.
> Added a new unit/integration test to verify the expected behavior.
> 
> 
> Thanks,
> 
> Adam B
> 
>


Re: Review Request 16724: Added completed frameworks/tasks to slave re-registration.

Posted by Adam B <ad...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16724/
-----------------------------------------------------------

(Updated Feb. 28, 2014, 12:50 p.m.)


Review request for mesos and Benjamin Hindman.


Changes
-------

Moved additional issues out of Description, and into Testing so they don't clog up the commit message.


Bugs: MESOS-767
    https://issues.apache.org/jira/browse/MESOS-767


Repository: mesos-git


Description (updated)
-------

Added completed frameworks/tasks to slave re-registration.
Fixes MESOS-767.


Diffs
-----

  include/mesos/scheduler.hpp 55db177 
  src/master/master.hpp 72525d2 
  src/master/master.cpp 2e86a19 
  src/messages/messages.proto 922a8c4 
  src/slave/slave.cpp 4f5349b 
  src/tests/fault_tolerance_tests.cpp 59632b0 
  src/tests/mesos.hpp 018d4ff 

Diff: https://reviews.apache.org/r/16724/diff/


Testing (updated)
-------

make check; manually failed-over a master, watched the slave reregister its completed frameworks, web UI shows completed tasks and stdout/stderr.
Added a new unit/integration test to verify the expected behavior.

Additional issues discovered during investigation:
- MESOS-905: Remove Framework.id in favor of FrameworkInfo.id
- MESOS-906: Last task in Completed Framework never graduates from
terminatedTasks to completedTasks.
- Completed frameworks/executors/tasks are stored in circular buffers,
and these may overflow in different orders on different slaves. 
BenH proposes an archive to replace these circular buffers.


Thanks,

Adam B


Re: Review Request 16724: Added completed frameworks/tasks to slave re-registration.

Posted by Adam B <ad...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16724/
-----------------------------------------------------------

(Updated Feb. 27, 2014, 2:32 p.m.)


Review request for mesos and Benjamin Hindman.


Changes
-------

Rebased and updated in response to BenH's feedback.


Bugs: MESOS-767
    https://issues.apache.org/jira/browse/MESOS-767


Repository: mesos-git


Description
-------

Added completed frameworks/tasks to slave re-registration.
Fixes MESOS-767.

Additional issues discovered during investigation:
- MESOS-905: Remove Framework.id in favor of FrameworkInfo.id
- MESOS-906: Last task in Completed Framework never graduates from
terminatedTasks to completedTasks.
- Completed frameworks/executors/tasks are stored in circular buffers,
and these may overflow in different orders on different slaves. 
BenH proposes an archive to replace these circular buffers.


Diffs (updated)
-----

  include/mesos/scheduler.hpp 55db177 
  src/master/master.hpp 72525d2 
  src/master/master.cpp 2e86a19 
  src/messages/messages.proto 922a8c4 
  src/slave/slave.cpp 4f5349b 
  src/tests/fault_tolerance_tests.cpp 59632b0 
  src/tests/mesos.hpp 018d4ff 

Diff: https://reviews.apache.org/r/16724/diff/


Testing
-------

make check; manually failed-over a master, watched the slave reregister its completed frameworks, web UI shows completed tasks and stdout/stderr.
Added a new unit/integration test to verify the expected behavior.


Thanks,

Adam B


Re: Review Request 16724: Added completed frameworks/tasks to slave re-registration.

Posted by Adam B <ad...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16724/
-----------------------------------------------------------

(Updated Feb. 27, 2014, 8:28 p.m.)


Review request for mesos and Benjamin Hindman.


Changes
-------

Made benh be accountable for this review.


Bugs: MESOS-767
    https://issues.apache.org/jira/browse/MESOS-767


Repository: mesos-git


Description
-------

Added completed frameworks/tasks to slave re-registration.
Fixes MESOS-767.

Additional issues discovered during investigation:
- MESOS-905: Remove Framework.id in favor of FrameworkInfo.id
- MESOS-906: Last task in Completed Framework never graduates from
terminatedTasks to completedTasks.
- Completed frameworks/executors/tasks are stored in circular buffers,
and these may overflow in different orders on different slaves. 
BenH proposes an archive to replace these circular buffers.


Diffs
-----

  include/mesos/scheduler.hpp 2e4707e 
  src/master/master.hpp 7649737 
  src/master/master.cpp 77872ec 
  src/messages/messages.proto 922a8c4 
  src/slave/slave.cpp 2d21e16 
  src/tests/fault_tolerance_tests.cpp 60e06cc 
  src/tests/mesos.hpp d7bdaee 

Diff: https://reviews.apache.org/r/16724/diff/


Testing
-------

make check; manually failed-over a master, watched the slave reregister its completed frameworks, web UI shows completed tasks and stdout/stderr.
Added a new unit/integration test to verify the expected behavior.


Thanks,

Adam B


Re: Review Request 16724: Added completed frameworks/tasks to slave re-registration.

Posted by Adam B <ad...@mesosphere.io>.

> On Feb. 27, 2014, 12:26 p.m., Benjamin Hindman wrote:
> > src/tests/fault_tolerance_tests.cpp, line 665
> > <https://reviews.apache.org/r/16724/diff/7/?file=497356#file497356line665>
> >
> >     Please move '{' to newline and use const & for parameters (and did you really want to return a 'const string'?).
> >     
> >     Also, please add a healthier comment that no only should we "use use a real JSON parser" but this will not return the entire value in the event that the value is an object or an array since it will stop at the first comma.

'{' moved to newline, parameters const&'d.
As for the return value, I've changed the function to bool isJsonValueEmpty(text, key) to ensure that it doesn't get misused under the assumption that it returns a full value. Added a comment in the implementation (doesn't need to be a function comment since the function only returns bool now).


> On Feb. 27, 2014, 12:26 p.m., Benjamin Hindman wrote:
> > src/tests/fault_tolerance_tests.cpp, line 759
> > <https://reviews.apache.org/r/16724/diff/7/?file=497356#file497356line759>
> >
> >     Is this supposed to be master.get() or slave.get()? Also, there is a lot of code here, can you put a newline between the masterState "stanza" and the slaveState "stanza" to make it more readable?

Good catch. Thanks! Fixed all instances.


> On Feb. 27, 2014, 12:26 p.m., Benjamin Hindman wrote:
> > src/tests/fault_tolerance_tests.cpp, line 672
> > <https://reviews.apache.org/r/16724/diff/7/?file=497356#file497356line672>
> >
> >     Can you leave a comment as to why you've created this here, how it differs from 'createTask' in mesos.hpp and perhaps even a TODO to move this into mesos.hpp? My guess is that you could easily overload 'createTask' in mesos.hpp and you get this behavior or you get the 'command' behavior if you pass a 'command' string.

An ExecutorInfo has a 'command' string too, but also an ExecutorID, so I've merged createTaskWith[Default]Executor into createTask by adding an extra "const Option<mesos::ExecutorID>& executorId = None()," parameter. If isNone(), the we use a CommandInfo(command), otherwise we use an ExecutorInfo(executorId, command). Now I can use "createTask(offers.get()[0], "exit 1", DEFAULT_EXECUTOR_ID);" to get the DEFAULT_EXECUTOR_INFO behavior.


- Adam


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16724/#review35689
-----------------------------------------------------------


On Feb. 27, 2014, 12:28 p.m., Adam B wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16724/
> -----------------------------------------------------------
> 
> (Updated Feb. 27, 2014, 12:28 p.m.)
> 
> 
> Review request for mesos and Benjamin Hindman.
> 
> 
> Bugs: MESOS-767
>     https://issues.apache.org/jira/browse/MESOS-767
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Added completed frameworks/tasks to slave re-registration.
> Fixes MESOS-767.
> 
> Additional issues discovered during investigation:
> - MESOS-905: Remove Framework.id in favor of FrameworkInfo.id
> - MESOS-906: Last task in Completed Framework never graduates from
> terminatedTasks to completedTasks.
> - Completed frameworks/executors/tasks are stored in circular buffers,
> and these may overflow in different orders on different slaves. 
> BenH proposes an archive to replace these circular buffers.
> 
> 
> Diffs
> -----
> 
>   include/mesos/scheduler.hpp 2e4707e 
>   src/master/master.hpp 7649737 
>   src/master/master.cpp 77872ec 
>   src/messages/messages.proto 922a8c4 
>   src/slave/slave.cpp 2d21e16 
>   src/tests/fault_tolerance_tests.cpp 60e06cc 
>   src/tests/mesos.hpp d7bdaee 
> 
> Diff: https://reviews.apache.org/r/16724/diff/
> 
> 
> Testing
> -------
> 
> make check; manually failed-over a master, watched the slave reregister its completed frameworks, web UI shows completed tasks and stdout/stderr.
> Added a new unit/integration test to verify the expected behavior.
> 
> 
> Thanks,
> 
> Adam B
> 
>


Re: Review Request 16724: Added completed frameworks/tasks to slave re-registration.

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16724/#review35689
-----------------------------------------------------------

Ship it!


Test has a few small cleanups needed.


src/master/master.cpp
<https://reviews.apache.org/r/16724/#comment66404>

    We don't try and wrap by putting the brace on a newline, please pull the brace back and wrap after the comma:
    
    foreach (const Archive::Framework& completedFramework_,
             completedFrameworks_) {
    ...



src/master/master.cpp
<https://reviews.apache.org/r/16724/#comment66405>

    See wrapping comment above.



src/slave/slave.cpp
<https://reviews.apache.org/r/16724/#comment66406>

    See wrapping comment above please.



src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/16724/#comment66384>

    Please move '{' to newline and use const & for parameters (and did you really want to return a 'const string'?).
    
    Also, please add a healthier comment that no only should we "use use a real JSON parser" but this will not return the entire value in the event that the value is an object or an array since it will stop at the first comma.



src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/16724/#comment66385>

    s/idx/index/



src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/16724/#comment66390>

    Can you leave a comment as to why you've created this here, how it differs from 'createTask' in mesos.hpp and perhaps even a TODO to move this into mesos.hpp? My guess is that you could easily overload 'createTask' in mesos.hpp and you get this behavior or you get the 'command' behavior if you pass a 'command' string.



src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/16724/#comment66396>

    Wrap this please (see other tests).



src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/16724/#comment66397>

    Do you need 'framework'? Why not just:
    
    EXPECT_NE("", frameworkId.get().value());



src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/16724/#comment66401>

    Is this supposed to be master.get() or slave.get()? Also, there is a lot of code here, can you put a newline between the masterState "stanza" and the slaveState "stanza" to make it more readable?



src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/16724/#comment66400>

    Is this supposed to be master.get() or slave.get()? Also, add a newline here like above.



src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/16724/#comment66380>

    Go ahead and put each argument on a newline when you wrap.


- Benjamin Hindman


On Feb. 19, 2014, 2:07 a.m., Adam B wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16724/
> -----------------------------------------------------------
> 
> (Updated Feb. 19, 2014, 2:07 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, Niklas Nielsen, and Vinod Kone.
> 
> 
> Bugs: MESOS-767
>     https://issues.apache.org/jira/browse/MESOS-767
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Added completed frameworks/tasks to slave re-registration.
> Fixes MESOS-767.
> 
> Additional issues discovered during investigation:
> - MESOS-905: Remove Framework.id in favor of FrameworkInfo.id
> - MESOS-906: Last task in Completed Framework never graduates from
> terminatedTasks to completedTasks.
> - Completed frameworks/executors/tasks are stored in circular buffers,
> and these may overflow in different orders on different slaves. 
> BenH proposes an archive to replace these circular buffers.
> 
> 
> Diffs
> -----
> 
>   include/mesos/scheduler.hpp 2e4707e 
>   src/master/master.hpp 7649737 
>   src/master/master.cpp 77872ec 
>   src/messages/messages.proto 922a8c4 
>   src/slave/slave.cpp 2d21e16 
>   src/tests/fault_tolerance_tests.cpp 60e06cc 
>   src/tests/mesos.hpp d7bdaee 
> 
> Diff: https://reviews.apache.org/r/16724/diff/
> 
> 
> Testing
> -------
> 
> make check; manually failed-over a master, watched the slave reregister its completed frameworks, web UI shows completed tasks and stdout/stderr.
> Added a new unit/integration test to verify the expected behavior.
> 
> 
> Thanks,
> 
> Adam B
> 
>


Re: Review Request 16724: Added completed frameworks/tasks to slave re-registration.

Posted by Adam B <ad...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16724/
-----------------------------------------------------------

(Updated Feb. 18, 2014, 6:07 p.m.)


Review request for mesos, Benjamin Hindman, Ben Mahler, Niklas Nielsen, and Vinod Kone.


Changes
-------

Rebase and respond to review feedback.


Bugs: MESOS-767
    https://issues.apache.org/jira/browse/MESOS-767


Repository: mesos-git


Description
-------

Added completed frameworks/tasks to slave re-registration.
Fixes MESOS-767.

Additional issues discovered during investigation:
- MESOS-905: Remove Framework.id in favor of FrameworkInfo.id
- MESOS-906: Last task in Completed Framework never graduates from
terminatedTasks to completedTasks.
- Completed frameworks/executors/tasks are stored in circular buffers,
and these may overflow in different orders on different slaves. 
BenH proposes an archive to replace these circular buffers.


Diffs (updated)
-----

  include/mesos/scheduler.hpp 2e4707e 
  src/master/master.hpp 7649737 
  src/master/master.cpp 77872ec 
  src/messages/messages.proto 922a8c4 
  src/slave/slave.cpp 2d21e16 
  src/tests/fault_tolerance_tests.cpp 60e06cc 
  src/tests/mesos.hpp d7bdaee 

Diff: https://reviews.apache.org/r/16724/diff/


Testing
-------

make check; manually failed-over a master, watched the slave reregister its completed frameworks, web UI shows completed tasks and stdout/stderr.
Added a new unit/integration test to verify the expected behavior.


Thanks,

Adam B


Re: Review Request 16724: Added completed frameworks/tasks to slave re-registration.

Posted by Adam B <ad...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16724/
-----------------------------------------------------------

(Updated Feb. 17, 2014, 4:23 p.m.)


Review request for mesos, Benjamin Hindman, Ben Mahler, Niklas Nielsen, and Vinod Kone.


Changes
-------

Added a unit test that starts a task/framework, then kills the task and shuts down the framework, leaving a completedFramework on the slave. After restarting the master, the slave reregisters with the new master and the completed framework is added to the new master's state. 
Master/slave state is read using find/substr on the state.json endpoint of each. A better approach would use a real json parser to get at the nested elements (to verify the executor/task status), and an even better approach would make the test a friend of Master/Slave so it can read the frameworks/completedFrameworks collections directly.


Bugs: MESOS-767
    https://issues.apache.org/jira/browse/MESOS-767


Repository: mesos-git


Description
-------

Added completed frameworks/tasks to slave re-registration.
Fixes MESOS-767.

Additional issues discovered during investigation:
- MESOS-905: Remove Framework.id in favor of FrameworkInfo.id
- MESOS-906: Last task in Completed Framework never graduates from
terminatedTasks to completedTasks.
- Completed frameworks/executors/tasks are stored in circular buffers,
and these may overflow in different orders on different slaves. 
BenH proposes an archive to replace these circular buffers.


Diffs (updated)
-----

  include/mesos/scheduler.hpp 2e4707e 
  src/master/master.hpp 7649737 
  src/master/master.cpp 77872ec 
  src/messages/messages.proto 922a8c4 
  src/slave/slave.cpp 2d21e16 
  src/tests/fault_tolerance_tests.cpp 60e06cc 
  src/tests/mesos.hpp d7bdaee 

Diff: https://reviews.apache.org/r/16724/diff/


Testing (updated)
-------

make check; manually failed-over a master, watched the slave reregister its completed frameworks, web UI shows completed tasks and stdout/stderr.
Added a new unit/integration test to verify the expected behavior.


Thanks,

Adam B


Re: Review Request 16724: Added completed frameworks/tasks to slave re-registration.

Posted by Adam B <ad...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16724/
-----------------------------------------------------------

(Updated Feb. 7, 2014, 3:56 p.m.)


Review request for mesos, Benjamin Hindman, Ben Mahler, Niklas Nielsen, and Vinod Kone.


Changes
-------

Minor updates based on review feedback. No need for re-review.
Unit test development still in progress.


Bugs: MESOS-767
    https://issues.apache.org/jira/browse/MESOS-767


Repository: mesos-git


Description
-------

Added completed frameworks/tasks to slave re-registration.
Fixes MESOS-767.

Additional issues discovered during investigation:
- MESOS-905: Remove Framework.id in favor of FrameworkInfo.id
- MESOS-906: Last task in Completed Framework never graduates from
terminatedTasks to completedTasks.
- Completed frameworks/executors/tasks are stored in circular buffers,
and these may overflow in different orders on different slaves. 
BenH proposes an archive to replace these circular buffers.


Diffs (updated)
-----

  src/master/master.hpp 7649737 
  src/master/master.cpp 77872ec 
  src/messages/messages.proto 922a8c4 
  src/slave/slave.cpp 2d21e16 

Diff: https://reviews.apache.org/r/16724/diff/


Testing
-------

make check; manually failed-over a master, watched the slave reregister its completed frameworks, web UI shows completed tasks and stdout/stderr.


Thanks,

Adam B


Re: Review Request 16724: Added completed frameworks/tasks to slave re-registration.

Posted by Adam B <ad...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16724/
-----------------------------------------------------------

(Updated Jan. 29, 2014, 4:17 p.m.)


Review request for mesos, Benjamin Hindman, Ben Mahler, Niklas Nielsen, and Vinod Kone.


Changes
-------

Updated diff based on BenH and Vinod's feedback.
Unit test still in development, not included.


Bugs: MESOS-767
    https://issues.apache.org/jira/browse/MESOS-767


Repository: mesos-git


Description
-------

Added completed frameworks/tasks to slave re-registration.
Fixes MESOS-767.

Additional issues discovered during investigation:
- MESOS-905: Remove Framework.id in favor of FrameworkInfo.id
- MESOS-906: Last task in Completed Framework never graduates from
terminatedTasks to completedTasks.
- Completed frameworks/executors/tasks are stored in circular buffers,
and these may overflow in different orders on different slaves. 
BenH proposes an archive to replace these circular buffers.


Diffs (updated)
-----

  src/master/master.hpp 7649737 
  src/master/master.cpp 77872ec 
  src/messages/messages.proto 922a8c4 
  src/slave/slave.cpp 2d21e16 

Diff: https://reviews.apache.org/r/16724/diff/


Testing
-------

make check; manually failed-over a master, watched the slave reregister its completed frameworks, web UI shows completed tasks and stdout/stderr.


Thanks,

Adam B


Re: Review Request 16724: Added completed frameworks/tasks to slave re-registration.

Posted by Adam B <ad...@mesosphere.io>.

> On Jan. 27, 2014, 10:44 p.m., Vinod Kone wrote:
> > Can we add a test for this please?

Certainly. I'll jump on that this week. Thanks for reviewing!


> On Jan. 27, 2014, 10:44 p.m., Vinod Kone wrote:
> > src/master/master.cpp, line 1367
> > <https://reviews.apache.org/r/16724/diff/3/?file=425217#file425217line1367>
> >
> >     s/slaveCompletedFrameworks/completedFrameworks/

'completedFrameworks' is already a member variable of Master. How about 'completedFrameworks_'?


> On Jan. 27, 2014, 10:44 p.m., Vinod Kone wrote:
> > src/master/master.cpp, line 2748
> > <https://reviews.apache.org/r/16724/diff/3/?file=425217#file425217line2748>
> >
> >     new line.

Comment removed after making time optional to Framework(). No further need for a newline.


> On Jan. 27, 2014, 10:44 p.m., Vinod Kone wrote:
> > src/slave/slave.cpp, lines 725-728
> > <https://reviews.apache.org/r/16724/diff/3/?file=425219#file425219line725>
> >
> >     I think you've brought this up before but did you figure out why a completed executor has terminated tasks?

Not exactly, not yet. I'll look into this as I'm writing and running tests.


- Adam


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16724/#review32961
-----------------------------------------------------------


On Jan. 16, 2014, 4:24 p.m., Adam B wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16724/
> -----------------------------------------------------------
> 
> (Updated Jan. 16, 2014, 4:24 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, Niklas Nielsen, and Vinod Kone.
> 
> 
> Bugs: MESOS-767
>     https://issues.apache.org/jira/browse/MESOS-767
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Added completed frameworks/tasks to slave re-registration.
> Fixes MESOS-767.
> 
> Additional issues discovered during investigation:
> - MESOS-905: Remove Framework.id in favor of FrameworkInfo.id
> - MESOS-906: Last task in Completed Framework never graduates from
> terminatedTasks to completedTasks.
> - Completed frameworks/executors/tasks are stored in circular buffers,
> and these may overflow in different orders on different slaves. 
> BenH proposes an archive to replace these circular buffers.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 95b9cec 
>   src/master/master.cpp 38c5532 
>   src/messages/messages.proto 1f264d5 
>   src/slave/slave.cpp 396293b 
> 
> Diff: https://reviews.apache.org/r/16724/diff/
> 
> 
> Testing
> -------
> 
> make check; manually failed-over a master, watched the slave reregister its completed frameworks, web UI shows completed tasks and stdout/stderr.
> 
> 
> Thanks,
> 
> Adam B
> 
>


Re: Review Request 16724: Added completed frameworks/tasks to slave re-registration.

Posted by Adam B <ad...@mesosphere.io>.

> On Jan. 27, 2014, 10:44 p.m., Vinod Kone wrote:
> > src/slave/slave.cpp, lines 725-728
> > <https://reviews.apache.org/r/16724/diff/3/?file=425219#file425219line725>
> >
> >     I think you've brought this up before but did you figure out why a completed executor has terminated tasks?
> 
> Adam B wrote:
>     Not exactly, not yet. I'll look into this as I'm writing and running tests.

Reproduced the flakiness in a unit test; smells like a race condition. I'll dig into it further as a part of MESOS-906.


- Adam


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16724/#review32961
-----------------------------------------------------------


On Feb. 18, 2014, 6:07 p.m., Adam B wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16724/
> -----------------------------------------------------------
> 
> (Updated Feb. 18, 2014, 6:07 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, Niklas Nielsen, and Vinod Kone.
> 
> 
> Bugs: MESOS-767
>     https://issues.apache.org/jira/browse/MESOS-767
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Added completed frameworks/tasks to slave re-registration.
> Fixes MESOS-767.
> 
> Additional issues discovered during investigation:
> - MESOS-905: Remove Framework.id in favor of FrameworkInfo.id
> - MESOS-906: Last task in Completed Framework never graduates from
> terminatedTasks to completedTasks.
> - Completed frameworks/executors/tasks are stored in circular buffers,
> and these may overflow in different orders on different slaves. 
> BenH proposes an archive to replace these circular buffers.
> 
> 
> Diffs
> -----
> 
>   include/mesos/scheduler.hpp 2e4707e 
>   src/master/master.hpp 7649737 
>   src/master/master.cpp 77872ec 
>   src/messages/messages.proto 922a8c4 
>   src/slave/slave.cpp 2d21e16 
>   src/tests/fault_tolerance_tests.cpp 60e06cc 
>   src/tests/mesos.hpp d7bdaee 
> 
> Diff: https://reviews.apache.org/r/16724/diff/
> 
> 
> Testing
> -------
> 
> make check; manually failed-over a master, watched the slave reregister its completed frameworks, web UI shows completed tasks and stdout/stderr.
> Added a new unit/integration test to verify the expected behavior.
> 
> 
> Thanks,
> 
> Adam B
> 
>


Re: Review Request 16724: Added completed frameworks/tasks to slave re-registration.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16724/#review32961
-----------------------------------------------------------


Can we add a test for this please?


src/master/master.cpp
<https://reviews.apache.org/r/16724/#comment62036>

    not yours but do you mind s/it's/its/



src/master/master.cpp
<https://reviews.apache.org/r/16724/#comment62035>

    s/slaveCompletedFrameworks/completedFrameworks/



src/master/master.cpp
<https://reviews.apache.org/r/16724/#comment62037>

    +1
    
    s/slaveCompletedFrameworks/completedFrameworks/



src/master/master.cpp
<https://reviews.apache.org/r/16724/#comment62038>

    new line.



src/master/master.cpp
<https://reviews.apache.org/r/16724/#comment62050>

    kill the log line or change it to VLOG.



src/master/master.cpp
<https://reviews.apache.org/r/16724/#comment62049>

    Kill the log line or change it to VLOG.



src/slave/slave.cpp
<https://reviews.apache.org/r/16724/#comment62040>

    s/id//



src/slave/slave.cpp
<https://reviews.apache.org/r/16724/#comment62043>

    I'm not sure we want to LOG these. We don't even log the active frameworks/executors/tasks above.
    
    Maybe VLOG if you want use it for debugging.
    



src/slave/slave.cpp
<https://reviews.apache.org/r/16724/#comment62042>

    s/MergeFrom/CopyFrom/



src/slave/slave.cpp
<https://reviews.apache.org/r/16724/#comment62044>

    ditto about logging.



src/slave/slave.cpp
<https://reviews.apache.org/r/16724/#comment62046>

    I think you've brought this up before but did you figure out why a completed executor has terminated tasks?



src/slave/slave.cpp
<https://reviews.apache.org/r/16724/#comment62045>

    ditto about logging.


- Vinod Kone


On Jan. 17, 2014, 12:24 a.m., Adam B wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16724/
> -----------------------------------------------------------
> 
> (Updated Jan. 17, 2014, 12:24 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, Niklas Nielsen, and Vinod Kone.
> 
> 
> Bugs: MESOS-767
>     https://issues.apache.org/jira/browse/MESOS-767
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Added completed frameworks/tasks to slave re-registration.
> Fixes MESOS-767.
> 
> Additional issues discovered during investigation:
> - MESOS-905: Remove Framework.id in favor of FrameworkInfo.id
> - MESOS-906: Last task in Completed Framework never graduates from
> terminatedTasks to completedTasks.
> - Completed frameworks/executors/tasks are stored in circular buffers,
> and these may overflow in different orders on different slaves. 
> BenH proposes an archive to replace these circular buffers.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 95b9cec 
>   src/master/master.cpp 38c5532 
>   src/messages/messages.proto 1f264d5 
>   src/slave/slave.cpp 396293b 
> 
> Diff: https://reviews.apache.org/r/16724/diff/
> 
> 
> Testing
> -------
> 
> make check; manually failed-over a master, watched the slave reregister its completed frameworks, web UI shows completed tasks and stdout/stderr.
> 
> 
> Thanks,
> 
> Adam B
> 
>


Re: Review Request 16724: Added completed frameworks/tasks to slave re-registration.

Posted by Adam B <ad...@mesosphere.io>.

> On Jan. 28, 2014, 9:59 a.m., Vinod Kone wrote:
> > Did you forget to update the diff? I'm planning to cut 0.17.0-rc1 in an hour or so. If you want this patch to be included in that release please update it before then.

Just marked things fixed as I made the edits at home. Building/testing now, but don't wait for me for rc1. I still have to write a new unit test.


- Adam


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16724/#review33008
-----------------------------------------------------------


On Jan. 16, 2014, 4:24 p.m., Adam B wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16724/
> -----------------------------------------------------------
> 
> (Updated Jan. 16, 2014, 4:24 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, Niklas Nielsen, and Vinod Kone.
> 
> 
> Bugs: MESOS-767
>     https://issues.apache.org/jira/browse/MESOS-767
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Added completed frameworks/tasks to slave re-registration.
> Fixes MESOS-767.
> 
> Additional issues discovered during investigation:
> - MESOS-905: Remove Framework.id in favor of FrameworkInfo.id
> - MESOS-906: Last task in Completed Framework never graduates from
> terminatedTasks to completedTasks.
> - Completed frameworks/executors/tasks are stored in circular buffers,
> and these may overflow in different orders on different slaves. 
> BenH proposes an archive to replace these circular buffers.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 95b9cec 
>   src/master/master.cpp 38c5532 
>   src/messages/messages.proto 1f264d5 
>   src/slave/slave.cpp 396293b 
> 
> Diff: https://reviews.apache.org/r/16724/diff/
> 
> 
> Testing
> -------
> 
> make check; manually failed-over a master, watched the slave reregister its completed frameworks, web UI shows completed tasks and stdout/stderr.
> 
> 
> Thanks,
> 
> Adam B
> 
>


Re: Review Request 16724: Added completed frameworks/tasks to slave re-registration.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16724/#review33008
-----------------------------------------------------------


Did you forget to update the diff? I'm planning to cut 0.17.0-rc1 in an hour or so. If you want this patch to be included in that release please update it before then.

- Vinod Kone


On Jan. 17, 2014, 12:24 a.m., Adam B wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16724/
> -----------------------------------------------------------
> 
> (Updated Jan. 17, 2014, 12:24 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, Niklas Nielsen, and Vinod Kone.
> 
> 
> Bugs: MESOS-767
>     https://issues.apache.org/jira/browse/MESOS-767
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Added completed frameworks/tasks to slave re-registration.
> Fixes MESOS-767.
> 
> Additional issues discovered during investigation:
> - MESOS-905: Remove Framework.id in favor of FrameworkInfo.id
> - MESOS-906: Last task in Completed Framework never graduates from
> terminatedTasks to completedTasks.
> - Completed frameworks/executors/tasks are stored in circular buffers,
> and these may overflow in different orders on different slaves. 
> BenH proposes an archive to replace these circular buffers.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 95b9cec 
>   src/master/master.cpp 38c5532 
>   src/messages/messages.proto 1f264d5 
>   src/slave/slave.cpp 396293b 
> 
> Diff: https://reviews.apache.org/r/16724/diff/
> 
> 
> Testing
> -------
> 
> make check; manually failed-over a master, watched the slave reregister its completed frameworks, web UI shows completed tasks and stdout/stderr.
> 
> 
> Thanks,
> 
> Adam B
> 
>


Re: Review Request 16724: Added completed frameworks/tasks to slave re-registration.

Posted by Adam B <ad...@mesosphere.io>.

> On Jan. 27, 2014, 8:54 p.m., Benjamin Hindman wrote:
> > src/master/master.cpp, line 2657
> > <https://reviews.apache.org/r/16724/diff/3/?file=425217#file425217line2657>
> >
> >     Let's use the same parameter name here and in the declaration please (same for Master::readdCompletedFramework too).

Changing to 'completedFrameworks_' since Master.completedFrameworks exists; updated the declarations as well.


> On Jan. 27, 2014, 8:54 p.m., Benjamin Hindman wrote:
> > src/master/master.cpp, line 2723
> > <https://reviews.apache.org/r/16724/diff/3/?file=425217#file425217line2723>
> >
> >     We align our wrapped "lists" (parameter lists, argument lists, etc).

Moot since this now fits on one line after the rename, but I understand your point.


> On Jan. 27, 2014, 8:54 p.m., Benjamin Hindman wrote:
> > src/master/master.cpp, line 2739
> > <https://reviews.apache.org/r/16724/diff/3/?file=425217#file425217line2739>
> >
> >     s/knownFramework/completedFramework/

Changed 'knownFramework' to 'completedFramework'. The alternate naming came from a desire to distinguish (a) the completedFramework_ being re-added by the slave from (b) the completedFramework iterator into the master's collection of known completedFrameworks, from (c) the framework object we end up adding the completedTasks to.


> On Jan. 27, 2014, 8:54 p.m., Benjamin Hindman wrote:
> > src/master/master.cpp, line 2748
> > <https://reviews.apache.org/r/16724/diff/3/?file=425217#file425217line2748>
> >
> >     Let's make this a TODO to make registration and reregistration time be optional in Framework.

No need for a TODO. Made time an optional parameter to Master::Framework(). (Slave::Framework does not track time.)


> On Jan. 27, 2014, 8:54 p.m., Benjamin Hindman wrote:
> > src/messages/messages.proto, line 371
> > <https://reviews.apache.org/r/16724/diff/3/?file=425218#file425218line371>
> >
> >     I was thinking we'd put this in the 'archive' package since we'll likely call a class 'Archive', but this is fine for now and we can always rename it later without causing upgrade/compatibility issues.

Added a TODO.


> On Jan. 27, 2014, 8:54 p.m., Benjamin Hindman wrote:
> > src/slave/slave.cpp, line 708
> > <https://reviews.apache.org/r/16724/diff/3/?file=425219#file425219line708>
> >
> >     Please kill all the 'msg' prefixes here, they're not necessary and not with convention.

s/msgCompletedFramework/completedFramework_/ (since we already have a 'completedFramework' iterator into Slave.completedFrameworks)
s/msgFrameworkInfo/frameworkInfo/


- Adam


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16724/#review32949
-----------------------------------------------------------


On Jan. 16, 2014, 4:24 p.m., Adam B wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16724/
> -----------------------------------------------------------
> 
> (Updated Jan. 16, 2014, 4:24 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, Niklas Nielsen, and Vinod Kone.
> 
> 
> Bugs: MESOS-767
>     https://issues.apache.org/jira/browse/MESOS-767
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Added completed frameworks/tasks to slave re-registration.
> Fixes MESOS-767.
> 
> Additional issues discovered during investigation:
> - MESOS-905: Remove Framework.id in favor of FrameworkInfo.id
> - MESOS-906: Last task in Completed Framework never graduates from
> terminatedTasks to completedTasks.
> - Completed frameworks/executors/tasks are stored in circular buffers,
> and these may overflow in different orders on different slaves. 
> BenH proposes an archive to replace these circular buffers.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 95b9cec 
>   src/master/master.cpp 38c5532 
>   src/messages/messages.proto 1f264d5 
>   src/slave/slave.cpp 396293b 
> 
> Diff: https://reviews.apache.org/r/16724/diff/
> 
> 
> Testing
> -------
> 
> make check; manually failed-over a master, watched the slave reregister its completed frameworks, web UI shows completed tasks and stdout/stderr.
> 
> 
> Thanks,
> 
> Adam B
> 
>


Re: Review Request 16724: Added completed frameworks/tasks to slave re-registration.

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16724/#review32949
-----------------------------------------------------------

Ship it!



src/master/master.cpp
<https://reviews.apache.org/r/16724/#comment62002>

    Let's use the same parameter name here and in the declaration please (same for Master::readdCompletedFramework too).



src/master/master.cpp
<https://reviews.apache.org/r/16724/#comment61999>

    We align our wrapped "lists" (parameter lists, argument lists, etc).



src/master/master.cpp
<https://reviews.apache.org/r/16724/#comment62004>

    = None();



src/master/master.cpp
<https://reviews.apache.org/r/16724/#comment62003>

    s/knownFramework/completedFramework/



src/master/master.cpp
<https://reviews.apache.org/r/16724/#comment62005>

    Let's make this a TODO to make registration and reregistration time be optional in Framework.



src/messages/messages.proto
<https://reviews.apache.org/r/16724/#comment62008>

    I was thinking we'd put this in the 'archive' package since we'll likely call a class 'Archive', but this is fine for now and we can always rename it later without causing upgrade/compatibility issues.



src/messages/messages.proto
<https://reviews.apache.org/r/16724/#comment62011>

    Can we make this optional? I can see moving something like 'pid' into another structure and I don't think it's that big of a deal if it's not present (it's only used for constructing a completed 'Framework' and not having the pid is not a big deal.



src/slave/slave.cpp
<https://reviews.apache.org/r/16724/#comment62014>

    Please kill all the 'msg' prefixes here, they're not necessary and not with convention.



src/slave/slave.cpp
<https://reviews.apache.org/r/16724/#comment62013>

    Please align the wrapping here.


- Benjamin Hindman


On Jan. 17, 2014, 12:24 a.m., Adam B wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16724/
> -----------------------------------------------------------
> 
> (Updated Jan. 17, 2014, 12:24 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, Niklas Nielsen, and Vinod Kone.
> 
> 
> Bugs: MESOS-767
>     https://issues.apache.org/jira/browse/MESOS-767
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Added completed frameworks/tasks to slave re-registration.
> Fixes MESOS-767.
> 
> Additional issues discovered during investigation:
> - MESOS-905: Remove Framework.id in favor of FrameworkInfo.id
> - MESOS-906: Last task in Completed Framework never graduates from
> terminatedTasks to completedTasks.
> - Completed frameworks/executors/tasks are stored in circular buffers,
> and these may overflow in different orders on different slaves. 
> BenH proposes an archive to replace these circular buffers.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 95b9cec 
>   src/master/master.cpp 38c5532 
>   src/messages/messages.proto 1f264d5 
>   src/slave/slave.cpp 396293b 
> 
> Diff: https://reviews.apache.org/r/16724/diff/
> 
> 
> Testing
> -------
> 
> make check; manually failed-over a master, watched the slave reregister its completed frameworks, web UI shows completed tasks and stdout/stderr.
> 
> 
> Thanks,
> 
> Adam B
> 
>


Re: Review Request 16724: Added completed frameworks/tasks to slave re-registration.

Posted by Adam B <ad...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16724/
-----------------------------------------------------------

(Updated Jan. 16, 2014, 4:24 p.m.)


Review request for mesos, Benjamin Hindman, Ben Mahler, Niklas Nielsen, and Vinod Kone.


Changes
-------

- Moved ReregisterSlaveMessage::CompletedFramework to Archive::Framework, based on discussion with BenH. (BenH: Is this what you were thinking?)
- Filed JIRAs for new issues discovered.
- Fixed "TODO(adam-mesos):" format


Bugs: MESOS-767
    https://issues.apache.org/jira/browse/MESOS-767


Repository: mesos-git


Description (updated)
-------

Added completed frameworks/tasks to slave re-registration.
Fixes MESOS-767.

Additional issues discovered during investigation:
- MESOS-905: Remove Framework.id in favor of FrameworkInfo.id
- MESOS-906: Last task in Completed Framework never graduates from
terminatedTasks to completedTasks.
- Completed frameworks/executors/tasks are stored in circular buffers,
and these may overflow in different orders on different slaves. 
BenH proposes an archive to replace these circular buffers.


Diffs (updated)
-----

  src/master/master.hpp 95b9cec 
  src/master/master.cpp 38c5532 
  src/messages/messages.proto 1f264d5 
  src/slave/slave.cpp 396293b 

Diff: https://reviews.apache.org/r/16724/diff/


Testing
-------

make check; manually failed-over a master, watched the slave reregister its completed frameworks, web UI shows completed tasks and stdout/stderr.


Thanks,

Adam B


Re: Review Request 16724: Added completed frameworks/tasks to slave re-registration.

Posted by Adam B <ad...@mesosphere.io>.

> On Jan. 13, 2014, 2:52 p.m., Joe Smith wrote:
> > src/slave/slave.cpp, line 731
> > <https://reviews.apache.org/r/16724/diff/2/?file=420459#file420459line731>
> >
> >     super naive question: is there a way we can add tests for this new chunk of code? would be great to lock-in (via tests) that the completed/finished frameworks get re-registered :)

Excellent point. I do want to add a unit test for this. I will familiarize myself with the mesos unit-testing framework and update the diff when my new test is ready. Thanks!


- Adam


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16724/#review31665
-----------------------------------------------------------


On Jan. 16, 2014, 4:24 p.m., Adam B wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16724/
> -----------------------------------------------------------
> 
> (Updated Jan. 16, 2014, 4:24 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, Niklas Nielsen, and Vinod Kone.
> 
> 
> Bugs: MESOS-767
>     https://issues.apache.org/jira/browse/MESOS-767
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Added completed frameworks/tasks to slave re-registration.
> Fixes MESOS-767.
> 
> Additional issues discovered during investigation:
> - MESOS-905: Remove Framework.id in favor of FrameworkInfo.id
> - MESOS-906: Last task in Completed Framework never graduates from
> terminatedTasks to completedTasks.
> - Completed frameworks/executors/tasks are stored in circular buffers,
> and these may overflow in different orders on different slaves. 
> BenH proposes an archive to replace these circular buffers.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 95b9cec 
>   src/master/master.cpp 38c5532 
>   src/messages/messages.proto 1f264d5 
>   src/slave/slave.cpp 396293b 
> 
> Diff: https://reviews.apache.org/r/16724/diff/
> 
> 
> Testing
> -------
> 
> make check; manually failed-over a master, watched the slave reregister its completed frameworks, web UI shows completed tasks and stdout/stderr.
> 
> 
> Thanks,
> 
> Adam B
> 
>


Re: Review Request 16724: Added completed frameworks/tasks to slave re-registration.

Posted by Joe Smith <ya...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16724/#review31665
-----------------------------------------------------------



src/slave/slave.cpp
<https://reviews.apache.org/r/16724/#comment60362>

    super naive question: is there a way we can add tests for this new chunk of code? would be great to lock-in (via tests) that the completed/finished frameworks get re-registered :)


- Joe Smith


On Jan. 13, 2014, 11:18 a.m., Adam B wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16724/
> -----------------------------------------------------------
> 
> (Updated Jan. 13, 2014, 11:18 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, Niklas Nielsen, and Vinod Kone.
> 
> 
> Bugs: MESOS-767
>     https://issues.apache.org/jira/browse/MESOS-767
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Added completed frameworks/tasks to slave re-registration.
> Fixes MESOS-767.
> 
> Additional issues discovered during investigation (jiras to-be-filed):
> - Framework.id is set, but Framework.FrameworkInfo.id is not.
> BenH left a TODO in master.hpp suggesting we move to 'info.id
> - A framework's last task may remain in the completedExecutor's
> terminatedTasks instead of moving to completedTasks.
> - Completed frameworks/executors/tasks are stored in circular buffers,
> and these may overflow in different orders on different slaves.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 95b9cec 
>   src/master/master.cpp 38c5532 
>   src/messages/messages.proto 1f264d5 
>   src/slave/slave.cpp 396293b 
> 
> Diff: https://reviews.apache.org/r/16724/diff/
> 
> 
> Testing
> -------
> 
> make check; manually failed-over a master, watched the slave reregister its completed frameworks, web UI shows completed tasks and stdout/stderr.
> 
> 
> Thanks,
> 
> Adam B
> 
>


Re: Review Request 16724: Added completed frameworks/tasks to slave re-registration.

Posted by Adam B <ad...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16724/
-----------------------------------------------------------

(Updated Jan. 13, 2014, 11:18 a.m.)


Review request for mesos, Benjamin Hindman, Ben Mahler, Niklas Nielsen, and Vinod Kone.


Changes
-------

Adding Vinod and BenM to the reviewers.


Bugs: MESOS-767
    https://issues.apache.org/jira/browse/MESOS-767


Repository: mesos-git


Description
-------

Added completed frameworks/tasks to slave re-registration.
Fixes MESOS-767.

Additional issues discovered during investigation (jiras to-be-filed):
- Framework.id is set, but Framework.FrameworkInfo.id is not.
BenH left a TODO in master.hpp suggesting we move to 'info.id
- A framework's last task may remain in the completedExecutor's
terminatedTasks instead of moving to completedTasks.
- Completed frameworks/executors/tasks are stored in circular buffers,
and these may overflow in different orders on different slaves.


Diffs
-----

  src/master/master.hpp 95b9cec 
  src/master/master.cpp 38c5532 
  src/messages/messages.proto 1f264d5 
  src/slave/slave.cpp 396293b 

Diff: https://reviews.apache.org/r/16724/diff/


Testing
-------

make check; manually failed-over a master, watched the slave reregister its completed frameworks, web UI shows completed tasks and stdout/stderr.


Thanks,

Adam B


Re: Review Request 16724: Added completed frameworks/tasks to slave re-registration.

Posted by Niklas Nielsen <ni...@qni.dk>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16724/#review31628
-----------------------------------------------------------

Ship it!


Looks great Adam!


src/master/master.hpp
<https://reviews.apache.org/r/16724/#comment60312>

    Can you add a ':' after (adam-mesos)?



src/slave/slave.cpp
<https://reviews.apache.org/r/16724/#comment60314>

    ':' again.


- Niklas Nielsen


On Jan. 10, 2014, 5:51 p.m., Adam B wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16724/
> -----------------------------------------------------------
> 
> (Updated Jan. 10, 2014, 5:51 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Niklas Nielsen.
> 
> 
> Bugs: MESOS-767
>     https://issues.apache.org/jira/browse/MESOS-767
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Added completed frameworks/tasks to slave re-registration.
> Fixes MESOS-767.
> 
> Additional issues discovered during investigation (jiras to-be-filed):
> - Framework.id is set, but Framework.FrameworkInfo.id is not.
> BenH left a TODO in master.hpp suggesting we move to 'info.id
> - A framework's last task may remain in the completedExecutor's
> terminatedTasks instead of moving to completedTasks.
> - Completed frameworks/executors/tasks are stored in circular buffers,
> and these may overflow in different orders on different slaves.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 95b9cec 
>   src/master/master.cpp 38c5532 
>   src/messages/messages.proto 1f264d5 
>   src/slave/slave.cpp 396293b 
> 
> Diff: https://reviews.apache.org/r/16724/diff/
> 
> 
> Testing
> -------
> 
> make check; manually failed-over a master, watched the slave reregister its completed frameworks, web UI shows completed tasks and stdout/stderr.
> 
> 
> Thanks,
> 
> Adam B
> 
>