You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Ben Mahler <be...@gmail.com> on 2013/11/04 21:37:21 UTC

Re: Review Request 14669: launchTasks on list of offers

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14669/#review28129
-----------------------------------------------------------


Cool! Sorry for taking so long to look at this!

We should dogfood our own API changes, can you send out a subsequent change for all of the test frameworks to update any calls to launchTasks?


include/mesos/scheduler.hpp
<https://reviews.apache.org/r/14669/#comment54715>

    Just an observation, this may prove to be slightly confusing for the users of this API.
    
    If they don't hold on to offers, they can't possibly place two offers into a vector and make a valid call to launchTasks. Perhaps we should clarify that the only way for a framework to have multiple offers for the same slave is if they're holding on to offers.
    
    Seems good for now, in the future, having a document in the /docs/ folder that explains the offer model in Mesos would be great for framework developers.



src/java/src/org/apache/mesos/SchedulerDriver.java
<https://reviews.apache.org/r/14669/#comment54716>

    Curious as to why you used @deprecated earlier but you're not using @deprecated on these two?



src/master/master.cpp
<https://reviews.apache.org/r/14669/#comment54717>

    Have you seen post-reviews.py? It will allow you to break apart your diffs into multiple patches (for example it looks like you moved the visitors above launchTasks, so in this diff it's difficult to see what's changed in the visitors).
    
    As an example, you could break these into two commits and send two separate reviews very easily with post-reviews.py. :)



src/master/master.cpp
<https://reviews.apache.org/r/14669/#comment54721>

    s/all together/altogether/



src/master/master.cpp
<https://reviews.apache.org/r/14669/#comment54722>

    Does the typedef prevent you from directly returning the string, using the implicit Option<T> constructor?
    
    return "Offer " + ...;
    
    Ditto for SlaveChecker and UniqueOfferIDChecker.



src/master/master.cpp
<https://reviews.apache.org/r/14669/#comment54723>

    Does this work with the typedef?
    
    return None();
    
    Ditto for SlaveChecker and UniqueOfferIDChecker.



src/master/master.cpp
<https://reviews.apache.org/r/14669/#comment54726>

    This sounds like UniqueSlaveChecker, although I'm surprised to see this checking for null pointers :(
    
    Can this take const& for all of these? 
    
    Looking at TaskInfoChecker, even though it takes pointers, none of the visitors are checking for NULL and it seems they should be taking const references instead. Feel free to make that change here or leave as is, but for the OfferVisitors can you take const& for the arguments?



src/master/master.cpp
<https://reviews.apache.org/r/14669/#comment54724>

    " uses slave " + stringify(_slave->id) " and slave " + stringify(slave->id);
    
    Let's not say "expected" here since we're not expecting a particular slave, but rather that only one slave is used.



src/master/master.cpp
<https://reviews.apache.org/r/14669/#comment54725>

    " uses slave " + stringify(_slave->id) " and slave " + stringify(slave->id);
    
    Let's not say "expected" here since we're not expecting a particular slave, but rather that only one slave is used.



src/master/master.cpp
<https://reviews.apache.org/r/14669/#comment54727>

    Looks like an Option<SlaveID> be sufficient here instead of storing the pointer?



src/master/master.cpp
<https://reviews.apache.org/r/14669/#comment54744>

    What if we broke up the validation steps? 
    
    First the offers are validated, then cleanup the offer visitors after this point (rather than at the very end with the task visitors). If there was an error, we have to remove the offers and return.
    
    Now, we validate the tasks, launching the valid ones. Then, always cleanup the task visitors / allocator / offers.
    
    Let's try to flatten the control flow a bit and avoid the nested break statements.



src/master/master.cpp
<https://reviews.apache.org/r/14669/#comment54728>

    Can you deal with the NULL case first to avoid having so much nesting?
    
    if (framework == NULL) {
      return; // We ignore this request.
    }
    
    vector<OfferID> offerIds;
    
    // Rest of your code.



src/master/master.cpp
<https://reviews.apache.org/r/14669/#comment54730>

    Definitely! Maybe a better place for this kind of TODO is on install() so that others find it more easily?



src/master/master.cpp
<https://reviews.apache.org/r/14669/#comment54734>

    const OfferID& ?



src/master/master.cpp
<https://reviews.apache.org/r/14669/#comment54735>

    Let's handle the NULL slave here instead? Would be great if the Visitor can take const references instead of pointers.



src/master/master.cpp
<https://reviews.apache.org/r/14669/#comment54736>

    Does this fit on one line?



src/master/master.cpp
<https://reviews.apache.org/r/14669/#comment54731>

    Seems like using Shared or Owned here would eliminate the need for manual memory cleanup here.



src/master/master.cpp
<https://reviews.apache.org/r/14669/#comment54732>

    Can you add a CHECK_NOTNULL in removeOffer()?



src/messages/messages.proto
<https://reviews.apache.org/r/14669/#comment54718>

    optional OfferID offer_id = 2; // Deprecated.



src/sched/sched.cpp
<https://reviews.apache.org/r/14669/#comment54720>

    Can you remove this TODO? Let's keep it simple here. If this proves to be a performance issue then we can consider an alternative approach, but my hunch is that the overwriting is simple, clean, and ok for now.


- Ben Mahler


On Oct. 29, 2013, 7:12 p.m., Niklas Nielsen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14669/
> -----------------------------------------------------------
> 
> (Updated Oct. 29, 2013, 7:12 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.
> 
> 
> Bugs: MESOS-749
>     https://issues.apache.org/jira/browse/MESOS-749
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Running tasks on more than one offer belonging to a single slave can be useful in situations with multiple out-standing offers.
> 
> This patch extends the usual launchTasks() to accept a vector of OfferIDs. The previous launchTasks (accepting a single OfferID) has been kept for backward compatibility, but this now calls the new launchTasks() with a one-element list.
> This also applied for the JNI and python interfaces, which accepts both formats as well.
> 
> Offers are verified to belong to the same slave and framework, before resources are merged and used.
> 
> 
> Diffs
> -----
> 
>   include/mesos/scheduler.hpp fa1ffe8 
>   src/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp 9869929 
>   src/java/src/org/apache/mesos/MesosSchedulerDriver.java ed4b4a3 
>   src/java/src/org/apache/mesos/SchedulerDriver.java 93aaa54 
>   src/master/master.hpp 1eba03f 
>   src/master/master.cpp 1147cc6 
>   src/messages/messages.proto a5dded2 
>   src/python/native/mesos_scheduler_driver_impl.cpp 059ed5d 
>   src/sched/sched.cpp 3049096 
>   src/tests/master_tests.cpp bf790d2 
>   src/tests/resource_offers_tests.cpp 2864c9a 
> 
> Diff: https://reviews.apache.org/r/14669/diff/
> 
> 
> Testing
> -------
> 
> Three new tests has been added: LaunchCombinedOfferTest, LaunchAcrossSlavesTest and LaunchDuplicateOfferTest
> This test ensures that:
> 1) Multiple offers can be used to run a single task (requesting the sum of offer resources).
> 2) Offers cannot span multiple slaves.
> 3) No offers can appear more than once in offer list.
> 
> $ make check
> ...
> [ RUN      ] MasterTest.LaunchCombinedOfferTest
> [       OK ] MasterTest.LaunchCombinedOfferTest (2010 ms)
> [ RUN      ] MasterTest.LaunchAcrossSlavesTest
> [       OK ] MasterTest.LaunchAcrossSlavesTest (3 ms)
> [ RUN      ] MasterTest.LaunchDuplicateOfferTest
> [       OK ] MasterTest.LaunchDuplicateOfferTest (3 ms)
> ...
> 
> 
> Thanks,
> 
> Niklas Nielsen
> 
>


Re: Review Request 14669: launchTasks on list of offers

Posted by Niklas Nielsen <ni...@qni.dk>.

> On Nov. 4, 2013, 8:37 p.m., Ben Mahler wrote:
> > Cool! Sorry for taking so long to look at this!
> > 
> > We should dogfood our own API changes, can you send out a subsequent change for all of the test frameworks to update any calls to launchTasks?

Thanks for the review Ben!

You bet, I'll follow up with a new review request.


> On Nov. 4, 2013, 8:37 p.m., Ben Mahler wrote:
> > include/mesos/scheduler.hpp, lines 242-249
> > <https://reviews.apache.org/r/14669/diff/13/?file=373620#file373620line242>
> >
> >     Just an observation, this may prove to be slightly confusing for the users of this API.
> >     
> >     If they don't hold on to offers, they can't possibly place two offers into a vector and make a valid call to launchTasks. Perhaps we should clarify that the only way for a framework to have multiple offers for the same slave is if they're holding on to offers.
> >     
> >     Seems good for now, in the future, having a document in the /docs/ folder that explains the offer model in Mesos would be great for framework developers.

Noted.


> On Nov. 4, 2013, 8:37 p.m., Ben Mahler wrote:
> > src/java/src/org/apache/mesos/SchedulerDriver.java, lines 110-122
> > <https://reviews.apache.org/r/14669/diff/13/?file=373623#file373623line110>
> >
> >     Curious as to why you used @deprecated earlier but you're not using @deprecated on these two?

No particular reason. The new patch use @deprecated for those too. 


> On Nov. 4, 2013, 8:37 p.m., Ben Mahler wrote:
> > src/master/master.cpp, line 1021
> > <https://reviews.apache.org/r/14669/diff/13/?file=373625#file373625line1021>
> >
> >     Have you seen post-reviews.py? It will allow you to break apart your diffs into multiple patches (for example it looks like you moved the visitors above launchTasks, so in this diff it's difficult to see what's changed in the visitors).
> >     
> >     As an example, you could break these into two commits and send two separate reviews very easily with post-reviews.py. :)

Sorry about that. In this case, processTasks() was merged into launchTasks() so the task visitors either needed to be prototyped or moved above launchTasks. 


> On Nov. 4, 2013, 8:37 p.m., Ben Mahler wrote:
> > src/master/master.cpp, lines 1250-1253
> > <https://reviews.apache.org/r/14669/diff/13/?file=373625#file373625line1250>
> >
> >     Does the typedef prevent you from directly returning the string, using the implicit Option<T> constructor?
> >     
> >     return "Offer " + ...;
> >     
> >     Ditto for SlaveChecker and UniqueOfferIDChecker.

Ah, right - good point! I was too heavily influenced by the old visitors :) 


> On Nov. 4, 2013, 8:37 p.m., Ben Mahler wrote:
> > src/master/master.cpp, line 1256
> > <https://reviews.apache.org/r/14669/diff/13/?file=373625#file373625line1256>
> >
> >     Does this work with the typedef?
> >     
> >     return None();
> >     
> >     Ditto for SlaveChecker and UniqueOfferIDChecker.

Thanks! done


> On Nov. 4, 2013, 8:37 p.m., Ben Mahler wrote:
> > src/master/master.cpp, line 1263
> > <https://reviews.apache.org/r/14669/diff/13/?file=373625#file373625line1263>
> >
> >     This sounds like UniqueSlaveChecker, although I'm surprised to see this checking for null pointers :(
> >     
> >     Can this take const& for all of these? 
> >     
> >     Looking at TaskInfoChecker, even though it takes pointers, none of the visitors are checking for NULL and it seems they should be taking const references instead. Feel free to make that change here or leave as is, but for the OfferVisitors can you take const& for the arguments?

I changed the arguments to be const references instead and removed the null check.


> On Nov. 4, 2013, 8:37 p.m., Ben Mahler wrote:
> > src/master/master.cpp, lines 1323-1328
> > <https://reviews.apache.org/r/14669/diff/13/?file=373625#file373625line1323>
> >
> >     What if we broke up the validation steps? 
> >     
> >     First the offers are validated, then cleanup the offer visitors after this point (rather than at the very end with the task visitors). If there was an error, we have to remove the offers and return.
> >     
> >     Now, we validate the tasks, launching the valid ones. Then, always cleanup the task visitors / allocator / offers.
> >     
> >     Let's try to flatten the control flow a bit and avoid the nested break statements.

Good point, I moved the offer validation tear-down up and flattened the control flow. I haven't been able to get rid of the nested break unfortunately.


> On Nov. 4, 2013, 8:37 p.m., Ben Mahler wrote:
> > src/master/master.cpp, line 1335
> > <https://reviews.apache.org/r/14669/diff/13/?file=373625#file373625line1335>
> >
> >     Definitely! Maybe a better place for this kind of TODO is on install() so that others find it more easily?

I will follow up with a subsequent patch with this todo.


> On Nov. 4, 2013, 8:37 p.m., Ben Mahler wrote:
> > src/master/master.cpp, line 1380
> > <https://reviews.apache.org/r/14669/diff/13/?file=373625#file373625line1380>
> >
> >     Let's handle the NULL slave here instead? Would be great if the Visitor can take const references instead of pointers.

Yep - done.


> On Nov. 4, 2013, 8:37 p.m., Ben Mahler wrote:
> > src/master/master.cpp, line 1521
> > <https://reviews.apache.org/r/14669/diff/13/?file=373625#file373625line1521>
> >
> >     Can you add a CHECK_NOTNULL in removeOffer()?

CHECK_NOTNULL will fail if you give a list of offers with duplicates. I thought it would be more graceful to ignore it. Do you still think we should assert? 


- Niklas


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14669/#review28129
-----------------------------------------------------------


On Nov. 13, 2013, 10:19 p.m., Niklas Nielsen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14669/
> -----------------------------------------------------------
> 
> (Updated Nov. 13, 2013, 10:19 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.
> 
> 
> Bugs: MESOS-749
>     https://issues.apache.org/jira/browse/MESOS-749
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Running tasks on more than one offer belonging to a single slave can be useful in situations with multiple out-standing offers.
> 
> This patch extends the usual launchTasks() to accept a vector of OfferIDs. The previous launchTasks (accepting a single OfferID) has been kept for backward compatibility, but this now calls the new launchTasks() with a one-element list.
> This also applied for the JNI and python interfaces, which accepts both formats as well.
> 
> Offers are verified to belong to the same slave and framework, before resources are merged and used.
> 
> 
> Diffs
> -----
> 
>   include/mesos/scheduler.hpp 380e087 
>   src/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp 9869929 
>   src/java/src/org/apache/mesos/MesosSchedulerDriver.java ed4b4a3 
>   src/java/src/org/apache/mesos/SchedulerDriver.java 5b0ca39 
>   src/master/master.hpp e377af8 
>   src/master/master.cpp 8e14a07 
>   src/messages/messages.proto a5dded2 
>   src/python/native/mesos_scheduler_driver_impl.cpp 059ed5d 
>   src/sched/sched.cpp 3abe72f 
>   src/tests/master_tests.cpp bf790d2 
>   src/tests/resource_offers_tests.cpp 2864c9a 
> 
> Diff: https://reviews.apache.org/r/14669/diff/
> 
> 
> Testing
> -------
> 
> Three new tests has been added: LaunchCombinedOfferTest, LaunchAcrossSlavesTest and LaunchDuplicateOfferTest
> This test ensures that:
> 1) Multiple offers can be used to run a single task (requesting the sum of offer resources).
> 2) Offers cannot span multiple slaves.
> 3) No offers can appear more than once in offer list.
> 
> $ make check
> ...
> [ RUN      ] MasterTest.LaunchCombinedOfferTest
> [       OK ] MasterTest.LaunchCombinedOfferTest (2010 ms)
> [ RUN      ] MasterTest.LaunchAcrossSlavesTest
> [       OK ] MasterTest.LaunchAcrossSlavesTest (3 ms)
> [ RUN      ] MasterTest.LaunchDuplicateOfferTest
> [       OK ] MasterTest.LaunchDuplicateOfferTest (3 ms)
> ...
> 
> 
> Thanks,
> 
> Niklas Nielsen
> 
>