You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Kapil Arya <ka...@mesosphere.io> on 2014/08/09 02:48:57 UTC

Re: Review Request 22066: Added timeout to rescind unused offers

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22066/
-----------------------------------------------------------

(Updated Aug. 8, 2014, 8:48 p.m.)


Review request for mesos, Adam B and Niklas Nielsen.


Bugs: mesos-186
    https://issues.apache.org/jira/browse/mesos-186


Repository: mesos-git


Description
-------

A timer is associated with each newly created offer.
The offer is rescinded on timeout.
The timer is disarmed on a launchTask or decline.


Diffs (updated)
-----

  src/master/flags.hpp 32704ce 
  src/master/master.hpp d8a4d9e 
  src/master/master.cpp 273a516 
  src/tests/master_tests.cpp 5a1cf7f 

Diff: https://reviews.apache.org/r/22066/diff/


Testing
-------

Added MasterTest.RescindResourceOffers and ran make check.


Thanks,

Kapil Arya


Re: Review Request 22066: Added timeout to rescind unused offers

Posted by Timothy Chen <tn...@gmail.com>.
Hi Ben,

Kapil is continuing where I left off and finish the patch (that he originally started himself).

So the one he posted should have all my changes plus addressing your comments.

Tim

Sent from my iPhone

> On Aug 12, 2014, at 2:18 PM, Benjamin Mahler <be...@gmail.com> wrote:
> 
> Hey Kapil, it looks like Tim had been following up with a separate review for this:
> 
> https://reviews.apache.org/r/22796/
> 
> 
>> On Mon, Aug 11, 2014 at 8:09 PM, Mesos ReviewBot <de...@mesos.apache.org> wrote:
>> 
>> -----------------------------------------------------------
>> This is an automatically generated e-mail. To reply, visit:
>> https://reviews.apache.org/r/22066/#review50281
>> -----------------------------------------------------------
>> 
>> 
>> Patch looks great!
>> 
>> Reviews applied: [22066]
>> 
>> All tests passed.
>> 
>> - Mesos ReviewBot
>> 
>> 
>> On Aug. 9, 2014, 12:57 a.m., Kapil Arya wrote:
>> >
>> > -----------------------------------------------------------
>> > This is an automatically generated e-mail. To reply, visit:
>> > https://reviews.apache.org/r/22066/
>> > -----------------------------------------------------------
>> >
>> > (Updated Aug. 9, 2014, 12:57 a.m.)
>> >
>> >
>> > Review request for mesos, Adam B, Ben Mahler, Niklas Nielsen, and Timothy Chen.
>> >
>> >
>> > Bugs: mesos-186
>> >     https://issues.apache.org/jira/browse/mesos-186
>> >
>> >
>> > Repository: mesos-git
>> >
>> >
>> > Description
>> > -------
>> >
>> > A timer is associated with each newly created offer.
>> > The offer is rescinded on timeout.
>> > The timer is disarmed on a launchTask or decline.
>> >
>> >
>> > Diffs
>> > -----
>> >
>> >   src/master/flags.hpp 0db4c95
>> >   src/master/master.hpp 29e8f49
>> >   src/master/master.cpp e688b41
>> >   src/tests/master_tests.cpp 9de2424
>> >
>> > Diff: https://reviews.apache.org/r/22066/diff/
>> >
>> >
>> > Testing
>> > -------
>> >
>> > Added one more test after Tim's patch to test offer is not rescinded once it has been declined.
>> >
>> > make check
>> >
>> >
>> > Thanks,
>> >
>> > Kapil Arya
>> >
>> >
> 

Re: Review Request 22066: Added timeout to rescind unused offers

Posted by Benjamin Mahler <be...@gmail.com>.
Hey Kapil, it looks like Tim had been following up with a separate review
for this:

https://reviews.apache.org/r/22796/


On Mon, Aug 11, 2014 at 8:09 PM, Mesos ReviewBot <de...@mesos.apache.org>
wrote:

>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22066/#review50281
> -----------------------------------------------------------
>
>
> Patch looks great!
>
> Reviews applied: [22066]
>
> All tests passed.
>
> - Mesos ReviewBot
>
>
> On Aug. 9, 2014, 12:57 a.m., Kapil Arya wrote:
> >
> > -----------------------------------------------------------
> > This is an automatically generated e-mail. To reply, visit:
> > https://reviews.apache.org/r/22066/
> > -----------------------------------------------------------
> >
> > (Updated Aug. 9, 2014, 12:57 a.m.)
> >
> >
> > Review request for mesos, Adam B, Ben Mahler, Niklas Nielsen, and
> Timothy Chen.
> >
> >
> > Bugs: mesos-186
> >     https://issues.apache.org/jira/browse/mesos-186
> >
> >
> > Repository: mesos-git
> >
> >
> > Description
> > -------
> >
> > A timer is associated with each newly created offer.
> > The offer is rescinded on timeout.
> > The timer is disarmed on a launchTask or decline.
> >
> >
> > Diffs
> > -----
> >
> >   src/master/flags.hpp 0db4c95
> >   src/master/master.hpp 29e8f49
> >   src/master/master.cpp e688b41
> >   src/tests/master_tests.cpp 9de2424
> >
> > Diff: https://reviews.apache.org/r/22066/diff/
> >
> >
> > Testing
> > -------
> >
> > Added one more test after Tim's patch to test offer is not rescinded
> once it has been declined.
> >
> > make check
> >
> >
> > Thanks,
> >
> > Kapil Arya
> >
> >
>
>

Re: Review Request 22066: Added timeout to rescind unused offers

Posted by Mesos ReviewBot <de...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22066/#review50281
-----------------------------------------------------------


Patch looks great!

Reviews applied: [22066]

All tests passed.

- Mesos ReviewBot


On Aug. 9, 2014, 12:57 a.m., Kapil Arya wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22066/
> -----------------------------------------------------------
> 
> (Updated Aug. 9, 2014, 12:57 a.m.)
> 
> 
> Review request for mesos, Adam B, Ben Mahler, Niklas Nielsen, and Timothy Chen.
> 
> 
> Bugs: mesos-186
>     https://issues.apache.org/jira/browse/mesos-186
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> A timer is associated with each newly created offer.
> The offer is rescinded on timeout.
> The timer is disarmed on a launchTask or decline.
> 
> 
> Diffs
> -----
> 
>   src/master/flags.hpp 0db4c95 
>   src/master/master.hpp 29e8f49 
>   src/master/master.cpp e688b41 
>   src/tests/master_tests.cpp 9de2424 
> 
> Diff: https://reviews.apache.org/r/22066/diff/
> 
> 
> Testing
> -------
> 
> Added one more test after Tim's patch to test offer is not rescinded once it has been declined.
> 
> make check
> 
> 
> Thanks,
> 
> Kapil Arya
> 
>


Re: Review Request 22066: Added timeout to rescind unused offers

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22066/#review50377
-----------------------------------------------------------


Thanks Kapil, looks great, just some minor cleanups for the tests.

These items are not 'issues' per se, but I've made them issues so that you can fix or drop them as a checklist. I wish ReviewBoard would add a non-"issue" way to tag things in a manner that they tracked as addressed or dropped, but alas!


src/master/flags.hpp
<https://reviews.apache.org/r/22066/#comment88144>

    End with a period. :)



src/tests/master_tests.cpp
<https://reviews.apache.org/r/22066/#comment88152>

    Maybe s/OfferTimeoutNonZero/OfferTimeout/ here and s/OfferTimeoutZero/NoOfferTimeout/ below? Seems a bit clearer?



src/tests/master_tests.cpp
<https://reviews.apache.org/r/22066/#comment88146>

    Why do you need to wait for this? Seems we could remove the FUTURE_PROTOBUF and AWAIT_READY here.



src/tests/master_tests.cpp
<https://reviews.apache.org/r/22066/#comment88145>

    s/resourceOffered/resourceOffers/
    
    When we name these Futures satisfied by the callbacks, we'll try to match their name with the callback name.



src/tests/master_tests.cpp
<https://reviews.apache.org/r/22066/#comment88147>

    Please end comments with a period. :)



src/tests/master_tests.cpp
<https://reviews.apache.org/r/22066/#comment88148>

    Doesn't look like the Clock::settle and Clock::resume are necessary.
    
    Since you are waiting for the offer to be rescinded, you don't need to ensure the pending timers/events are flushed in libprocess via Clock::settle.



src/tests/master_tests.cpp
<https://reviews.apache.org/r/22066/#comment88149>

    How about expecting that the resources are re-offered to the framework after the rescind?



src/tests/master_tests.cpp
<https://reviews.apache.org/r/22066/#comment88153>

    How about saying "if there is no rescind timeout" rather than referring to zero?



src/tests/master_tests.cpp
<https://reviews.apache.org/r/22066/#comment88154>

    How about pluralizing since there are many offers: s/offer1/offers1/
    
    Or per my comment above with matching the callback name: s/offer1/resourceOffers1/



src/tests/master_tests.cpp
<https://reviews.apache.org/r/22066/#comment88150>

    Ditto for ending with a period.



src/tests/master_tests.cpp
<https://reviews.apache.org/r/22066/#comment88155>

    Don't think you need to capture or wait on this future, since you are waiting for the TASK_RUNNING status below. Anything I'm missing here?



src/tests/master_tests.cpp
<https://reviews.apache.org/r/22066/#comment88156>

    You can be explicit about no rescind occurring by using .Times(0) as you did above. Any reason not to here?
    
    



src/tests/master_tests.cpp
<https://reviews.apache.org/r/22066/#comment88151>

    Period here.



src/tests/master_tests.cpp
<https://reviews.apache.org/r/22066/#comment88157>

    I don't think this is enough for this test.
    
    We're not sure by the time that we advance the clock whether the offer has been created yet.
    
    I would suggest the following:
    
    Remove this future, instead wait for the decline LaunchTasksMessage to reach the master, then settle the clock to process it. After this point you can advance it and settle it to ensure that no rescind occurs. Make sense? 


- Ben Mahler


On Aug. 12, 2014, 10:14 p.m., Kapil Arya wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22066/
> -----------------------------------------------------------
> 
> (Updated Aug. 12, 2014, 10:14 p.m.)
> 
> 
> Review request for mesos, Adam B, Ben Mahler, Niklas Nielsen, and Timothy Chen.
> 
> 
> Bugs: mesos-186
>     https://issues.apache.org/jira/browse/mesos-186
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> A timer is associated with each newly created offer.
> The offer is rescinded on timeout.
> The timer is disarmed on a launchTask or decline.
> 
> This is continuation of Tim's patch: https://reviews.apache.org/r/22796/.  Revision two represents the latest patch from Tim. I have tried to address the comments/concerns from that request into the third revision.
> 
> 
> Diffs
> -----
> 
>   src/master/flags.hpp 0db4c95 
>   src/master/master.hpp 29e8f49 
>   src/master/master.cpp e688b41 
>   src/tests/master_tests.cpp 9de2424 
> 
> Diff: https://reviews.apache.org/r/22066/diff/
> 
> 
> Testing
> -------
> 
> Added one more test after Tim's patch to test offer is not rescinded once it has been declined.
> 
> make check
> 
> 
> Thanks,
> 
> Kapil Arya
> 
>


Re: Review Request 22066: Added timeout to rescind unused offers

Posted by Kapil Arya <ka...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22066/#review51057
-----------------------------------------------------------



src/tests/master_tests.cpp
<https://reviews.apache.org/r/22066/#comment88980>

    I am missing something here. If we remove the Clock::resume() call, how would we ensure that the second call to resourceOffers(&resourceReoffered) is satisfied?
    
    I tried removing the Clock::resume and it did fail indeed.  Adding it back made the test pass so I haven't removed it for the time being.


- Kapil Arya


On Aug. 18, 2014, 5:37 p.m., Kapil Arya wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22066/
> -----------------------------------------------------------
> 
> (Updated Aug. 18, 2014, 5:37 p.m.)
> 
> 
> Review request for mesos, Adam B, Ben Mahler, Niklas Nielsen, and Timothy Chen.
> 
> 
> Bugs: mesos-186
>     https://issues.apache.org/jira/browse/mesos-186
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> A timer is associated with each newly created offer.
> The offer is rescinded on timeout.
> The timer is disarmed on a launchTask or decline.
> 
> This is continuation of Tim's patch: https://reviews.apache.org/r/22796/.  Revision two represents the latest patch from Tim. I have tried to address the comments/concerns from that request into the third revision.
> 
> 
> Diffs
> -----
> 
>   src/master/flags.hpp 5e9ecb5 
>   src/master/master.hpp c9f989a 
>   src/master/master.cpp e948803 
>   src/tests/master_tests.cpp 9de2424 
> 
> Diff: https://reviews.apache.org/r/22066/diff/
> 
> 
> Testing
> -------
> 
> Added one more test after Tim's patch to test offer is not rescinded once it has been declined.
> 
> make check
> 
> 
> Thanks,
> 
> Kapil Arya
> 
>


Re: Review Request 22066: Added timeout to rescind unused offers

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22066/#review52485
-----------------------------------------------------------

Ship it!


Thanks Kapil! Will commit this shortly. I made a few minor comments that I will take care of before committing.


src/master/master.cpp
<https://reviews.apache.org/r/22066/#comment91279>

    Let's move this above the ephemeral port stripping so that the two maps 'offers' and 'offerTimers' are manipulated close to one another.



src/tests/master_tests.cpp
<https://reviews.apache.org/r/22066/#comment91282>

    We can capture the offers themselves to ensure that we're being re-offered the same amount of resources.



src/tests/master_tests.cpp
<https://reviews.apache.org/r/22066/#comment91289>

    Would be nice to add a comment here reflecting that the allocation interval is why we needed to resume the clock.



src/tests/master_tests.cpp
<https://reviews.apache.org/r/22066/#comment91285>

    I think we can remove this test, since the resource offer / rescind mechanics with no offer timeout is exercised by most of the integration tests, and the mock scheduler will complain should there be any unexpected calls (e.g. rescind).



src/tests/master_tests.cpp
<https://reviews.apache.org/r/22066/#comment91290>

    Have you run the tests lately? This test fails because the slave is expecting to hear back from the master within 75 seconds, otherwise it re-registers. At that point, the master rescinds the offers.
    
    I'll update these tests to use a 30 second timeout for now.



src/tests/master_tests.cpp
<https://reviews.apache.org/r/22066/#comment91286>

    "rescinded"


- Ben Mahler


On Sept. 4, 2014, 9:09 p.m., Kapil Arya wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22066/
> -----------------------------------------------------------
> 
> (Updated Sept. 4, 2014, 9:09 p.m.)
> 
> 
> Review request for mesos, Adam B, Ben Mahler, Niklas Nielsen, and Timothy Chen.
> 
> 
> Bugs: mesos-186
>     https://issues.apache.org/jira/browse/mesos-186
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> A timer is associated with each newly created offer.
> The offer is rescinded on timeout.
> The timer is disarmed on a launchTask or decline.
> 
> This is continuation of Tim's patch: https://reviews.apache.org/r/22796/.  Revision two represents the latest patch from Tim. I have tried to address the comments/concerns from that request into the third revision.
> 
> 
> Diffs
> -----
> 
>   src/master/flags.hpp 5e9ecb5 
>   src/master/master.hpp c9f989a 
>   src/master/master.cpp 2508b38 
>   src/tests/master_tests.cpp eaa1675 
> 
> Diff: https://reviews.apache.org/r/22066/diff/
> 
> 
> Testing
> -------
> 
> Added one more test after Tim's patch to test offer is not rescinded once it has been declined.
> 
> make check
> 
> 
> Thanks,
> 
> Kapil Arya
> 
>


Re: Review Request 22066: Added timeout to rescind unused offers

Posted by Kapil Arya <ka...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22066/
-----------------------------------------------------------

(Updated Sept. 4, 2014, 5:09 p.m.)


Review request for mesos, Adam B, Ben Mahler, Niklas Nielsen, and Timothy Chen.


Changes
-------

More updates to reflect Ben's suggestions.


Bugs: mesos-186
    https://issues.apache.org/jira/browse/mesos-186


Repository: mesos-git


Description
-------

A timer is associated with each newly created offer.
The offer is rescinded on timeout.
The timer is disarmed on a launchTask or decline.

This is continuation of Tim's patch: https://reviews.apache.org/r/22796/.  Revision two represents the latest patch from Tim. I have tried to address the comments/concerns from that request into the third revision.


Diffs (updated)
-----

  src/master/flags.hpp 5e9ecb5 
  src/master/master.hpp c9f989a 
  src/master/master.cpp 2508b38 
  src/tests/master_tests.cpp eaa1675 

Diff: https://reviews.apache.org/r/22066/diff/


Testing
-------

Added one more test after Tim's patch to test offer is not rescinded once it has been declined.

make check


Thanks,

Kapil Arya


Re: Review Request 22066: Added timeout to rescind unused offers

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22066/#review51478
-----------------------------------------------------------



src/master/flags.hpp
<https://reviews.apache.org/r/22066/#comment89831>

    Rather than relying on Duration::zero() as a special value indicating no timeout, let's encode this via an Option<Duration> as we've done with other optional flags. :)
    
    Then we can also validate the flag, expecting that if it is provided, it is strictly greater than Duration::zero().


- Ben Mahler


On Aug. 20, 2014, 1:54 a.m., Kapil Arya wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22066/
> -----------------------------------------------------------
> 
> (Updated Aug. 20, 2014, 1:54 a.m.)
> 
> 
> Review request for mesos, Adam B, Ben Mahler, Niklas Nielsen, and Timothy Chen.
> 
> 
> Bugs: mesos-186
>     https://issues.apache.org/jira/browse/mesos-186
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> A timer is associated with each newly created offer.
> The offer is rescinded on timeout.
> The timer is disarmed on a launchTask or decline.
> 
> This is continuation of Tim's patch: https://reviews.apache.org/r/22796/.  Revision two represents the latest patch from Tim. I have tried to address the comments/concerns from that request into the third revision.
> 
> 
> Diffs
> -----
> 
>   src/master/flags.hpp 5e9ecb5 
>   src/master/master.hpp c9f989a 
>   src/master/master.cpp 18464ba 
>   src/tests/master_tests.cpp 9de2424 
> 
> Diff: https://reviews.apache.org/r/22066/diff/
> 
> 
> Testing
> -------
> 
> Added one more test after Tim's patch to test offer is not rescinded once it has been declined.
> 
> make check
> 
> 
> Thanks,
> 
> Kapil Arya
> 
>


Re: Review Request 22066: Added timeout to rescind unused offers

Posted by Kapil Arya <ka...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22066/
-----------------------------------------------------------

(Updated Aug. 19, 2014, 9:54 p.m.)


Review request for mesos, Adam B, Ben Mahler, Niklas Nielsen, and Timothy Chen.


Changes
-------

Updated to reflect Ben's suggestions.


Bugs: mesos-186
    https://issues.apache.org/jira/browse/mesos-186


Repository: mesos-git


Description
-------

A timer is associated with each newly created offer.
The offer is rescinded on timeout.
The timer is disarmed on a launchTask or decline.

This is continuation of Tim's patch: https://reviews.apache.org/r/22796/.  Revision two represents the latest patch from Tim. I have tried to address the comments/concerns from that request into the third revision.


Diffs (updated)
-----

  src/master/flags.hpp 5e9ecb5 
  src/master/master.hpp c9f989a 
  src/master/master.cpp 18464ba 
  src/tests/master_tests.cpp 9de2424 

Diff: https://reviews.apache.org/r/22066/diff/


Testing
-------

Added one more test after Tim's patch to test offer is not rescinded once it has been declined.

make check


Thanks,

Kapil Arya


Re: Review Request 22066: Added timeout to rescind unused offers

Posted by Ben Mahler <be...@gmail.com>.

> On Aug. 19, 2014, 10:21 p.m., Ben Mahler wrote:
> > src/tests/master_tests.cpp, lines 2134-2136
> > <https://reviews.apache.org/r/22066/diff/4/?file=662801#file662801line2134>
> >
> >     This comment is no longer relevant, and we can remove the Clock::resume

Unfortunately you have to reply on the main review page for ReviewBoard to keep the thread of conversation intact, maybe they've fixed that in this new version :)

Can you share how it fails when the Clock::resume it provided? So long as we pause/advance the clock *after* the offer timeout has been created, the timeout should fire once the clock is advanced and remains paused.

Since we call 'delay' before sending the offers to the framework, and we have an AWAIT_READY(resourceOffers) before we pause the clock, resuming the clock should be unnecessary since we've paused it *after* the delay timer has been created.

I must be missing something :)


- Ben


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22066/#review51039
-----------------------------------------------------------


On Aug. 20, 2014, 1:54 a.m., Kapil Arya wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22066/
> -----------------------------------------------------------
> 
> (Updated Aug. 20, 2014, 1:54 a.m.)
> 
> 
> Review request for mesos, Adam B, Ben Mahler, Niklas Nielsen, and Timothy Chen.
> 
> 
> Bugs: mesos-186
>     https://issues.apache.org/jira/browse/mesos-186
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> A timer is associated with each newly created offer.
> The offer is rescinded on timeout.
> The timer is disarmed on a launchTask or decline.
> 
> This is continuation of Tim's patch: https://reviews.apache.org/r/22796/.  Revision two represents the latest patch from Tim. I have tried to address the comments/concerns from that request into the third revision.
> 
> 
> Diffs
> -----
> 
>   src/master/flags.hpp 5e9ecb5 
>   src/master/master.hpp c9f989a 
>   src/master/master.cpp 18464ba 
>   src/tests/master_tests.cpp 9de2424 
> 
> Diff: https://reviews.apache.org/r/22066/diff/
> 
> 
> Testing
> -------
> 
> Added one more test after Tim's patch to test offer is not rescinded once it has been declined.
> 
> make check
> 
> 
> Thanks,
> 
> Kapil Arya
> 
>


Re: Review Request 22066: Added timeout to rescind unused offers

Posted by Kapil Arya <ka...@mesosphere.io>.

> On Aug. 19, 2014, 6:21 p.m., Ben Mahler wrote:
> > src/tests/master_tests.cpp, lines 2134-2136
> > <https://reviews.apache.org/r/22066/diff/4/?file=662801#file662801line2134>
> >
> >     This comment is no longer relevant, and we can remove the Clock::resume
> 
> Ben Mahler wrote:
>     Unfortunately you have to reply on the main review page for ReviewBoard to keep the thread of conversation intact, maybe they've fixed that in this new version :)
>     
>     Can you share how it fails when the Clock::resume it provided? So long as we pause/advance the clock *after* the offer timeout has been created, the timeout should fire once the clock is advanced and remains paused.
>     
>     Since we call 'delay' before sending the offers to the framework, and we have an AWAIT_READY(resourceOffers) before we pause the clock, resuming the clock should be unnecessary since we've paused it *after* the delay timer has been created.
>     
>     I must be missing something :)

If I remove Clock::resume(), the test fails with the following error:
[ RUN      ] MasterTest.OfferTimeout
../../src/tests/master_tests.cpp:2147: Failure
Failed to wait 10secs for resourceReoffered
../../src/tests/master_tests.cpp:2120: Failure
Actual function call count doesn't match EXPECT_CALL(sched, resourceOffers(&driver, _))...
         Expected: to be called twice
           Actual: called once - unsatisfied and active
           
Don't we need to resume the clock to allow it to satisfy the next resource offer?


- Kapil


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22066/#review51039
-----------------------------------------------------------


On Aug. 19, 2014, 9:54 p.m., Kapil Arya wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22066/
> -----------------------------------------------------------
> 
> (Updated Aug. 19, 2014, 9:54 p.m.)
> 
> 
> Review request for mesos, Adam B, Ben Mahler, Niklas Nielsen, and Timothy Chen.
> 
> 
> Bugs: mesos-186
>     https://issues.apache.org/jira/browse/mesos-186
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> A timer is associated with each newly created offer.
> The offer is rescinded on timeout.
> The timer is disarmed on a launchTask or decline.
> 
> This is continuation of Tim's patch: https://reviews.apache.org/r/22796/.  Revision two represents the latest patch from Tim. I have tried to address the comments/concerns from that request into the third revision.
> 
> 
> Diffs
> -----
> 
>   src/master/flags.hpp 5e9ecb5 
>   src/master/master.hpp c9f989a 
>   src/master/master.cpp 18464ba 
>   src/tests/master_tests.cpp 9de2424 
> 
> Diff: https://reviews.apache.org/r/22066/diff/
> 
> 
> Testing
> -------
> 
> Added one more test after Tim's patch to test offer is not rescinded once it has been declined.
> 
> make check
> 
> 
> Thanks,
> 
> Kapil Arya
> 
>


Re: Review Request 22066: Added timeout to rescind unused offers

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22066/#review51039
-----------------------------------------------------------


Ok, almost there! One main suggestion: can this flag be optional and disabled by default for now? This might cause trouble for large clusters per my comment below. If disabled by default initially, we can safely test turning on this flag for large clusters, and we don't need to send a big heads up to the user/dev mailing lists about this new behavior.


src/master/flags.hpp
<https://reviews.apache.org/r/22066/#comment88916>

    Can we have this as an optional flag that is disabled initially? My concern is that this will cause a lot of additional offer churn for large clusters (10,000s of slaves). Especially with 5 minutes!
    
    Let's add a TODO here that this description will need to be updated when we have optimistic offers:
    
    E.g.:
    // TODO(karya): When we have optimistic offers, this will only
    // benefit frameworks that accidentally lose an offer.



src/tests/master_tests.cpp
<https://reviews.apache.org/r/22066/#comment88927>

    "an offer" :)



src/tests/master_tests.cpp
<https://reviews.apache.org/r/22066/#comment88928>

    This comment is no longer relevant, and we can remove the Clock::resume



src/tests/master_tests.cpp
<https://reviews.apache.org/r/22066/#comment88932>

    /s/  / /



src/tests/master_tests.cpp
<https://reviews.apache.org/r/22066/#comment88929>

    We don't need to resume it, settling will ensure that any pending timers / events are processed.



src/tests/master_tests.cpp
<https://reviews.apache.org/r/22066/#comment88931>

    "be rescinded"



src/tests/master_tests.cpp
<https://reviews.apache.org/r/22066/#comment88933>

    No need to resume per comment above.



src/tests/master_tests.cpp
<https://reviews.apache.org/r/22066/#comment88930>

    "be rescinded"



src/tests/master_tests.cpp
<https://reviews.apache.org/r/22066/#comment88935>

    It's not the slave that declines, it's the framework. How about:
    
    "Wait for the framework to decline the offers."



src/tests/master_tests.cpp
<https://reviews.apache.org/r/22066/#comment88934>

    No need to resume per comment above.


- Ben Mahler


On Aug. 18, 2014, 9:37 p.m., Kapil Arya wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22066/
> -----------------------------------------------------------
> 
> (Updated Aug. 18, 2014, 9:37 p.m.)
> 
> 
> Review request for mesos, Adam B, Ben Mahler, Niklas Nielsen, and Timothy Chen.
> 
> 
> Bugs: mesos-186
>     https://issues.apache.org/jira/browse/mesos-186
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> A timer is associated with each newly created offer.
> The offer is rescinded on timeout.
> The timer is disarmed on a launchTask or decline.
> 
> This is continuation of Tim's patch: https://reviews.apache.org/r/22796/.  Revision two represents the latest patch from Tim. I have tried to address the comments/concerns from that request into the third revision.
> 
> 
> Diffs
> -----
> 
>   src/master/flags.hpp 5e9ecb5 
>   src/master/master.hpp c9f989a 
>   src/master/master.cpp e948803 
>   src/tests/master_tests.cpp 9de2424 
> 
> Diff: https://reviews.apache.org/r/22066/diff/
> 
> 
> Testing
> -------
> 
> Added one more test after Tim's patch to test offer is not rescinded once it has been declined.
> 
> make check
> 
> 
> Thanks,
> 
> Kapil Arya
> 
>


Re: Review Request 22066: Added timeout to rescind unused offers

Posted by Kapil Arya <ka...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22066/
-----------------------------------------------------------

(Updated Aug. 18, 2014, 5:37 p.m.)


Review request for mesos, Adam B, Ben Mahler, Niklas Nielsen, and Timothy Chen.


Changes
-------

Addressed Ben's suggestions.


Bugs: mesos-186
    https://issues.apache.org/jira/browse/mesos-186


Repository: mesos-git


Description
-------

A timer is associated with each newly created offer.
The offer is rescinded on timeout.
The timer is disarmed on a launchTask or decline.

This is continuation of Tim's patch: https://reviews.apache.org/r/22796/.  Revision two represents the latest patch from Tim. I have tried to address the comments/concerns from that request into the third revision.


Diffs (updated)
-----

  src/master/flags.hpp 5e9ecb5 
  src/master/master.hpp c9f989a 
  src/master/master.cpp e948803 
  src/tests/master_tests.cpp 9de2424 

Diff: https://reviews.apache.org/r/22066/diff/


Testing
-------

Added one more test after Tim's patch to test offer is not rescinded once it has been declined.

make check


Thanks,

Kapil Arya


Re: Review Request 22066: Added timeout to rescind unused offers

Posted by Kapil Arya <ka...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22066/
-----------------------------------------------------------

(Updated Aug. 12, 2014, 6:14 p.m.)


Review request for mesos, Adam B, Ben Mahler, Niklas Nielsen, and Timothy Chen.


Bugs: mesos-186
    https://issues.apache.org/jira/browse/mesos-186


Repository: mesos-git


Description (updated)
-------

A timer is associated with each newly created offer.
The offer is rescinded on timeout.
The timer is disarmed on a launchTask or decline.

This is continuation of Tim's patch: https://reviews.apache.org/r/22796/.  Revision two represents the latest patch from Tim. I have tried to address the comments/concerns from that request into the third revision.


Diffs
-----

  src/master/flags.hpp 0db4c95 
  src/master/master.hpp 29e8f49 
  src/master/master.cpp e688b41 
  src/tests/master_tests.cpp 9de2424 

Diff: https://reviews.apache.org/r/22066/diff/


Testing
-------

Added one more test after Tim's patch to test offer is not rescinded once it has been declined.

make check


Thanks,

Kapil Arya


Re: Review Request 22066: Added timeout to rescind unused offers

Posted by Kapil Arya <ka...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22066/
-----------------------------------------------------------

(Updated Aug. 8, 2014, 8:57 p.m.)


Review request for mesos, Adam B, Ben Mahler, Niklas Nielsen, and Timothy Chen.


Bugs: mesos-186
    https://issues.apache.org/jira/browse/mesos-186


Repository: mesos-git


Description
-------

A timer is associated with each newly created offer.
The offer is rescinded on timeout.
The timer is disarmed on a launchTask or decline.


Diffs
-----

  src/master/flags.hpp 0db4c95 
  src/master/master.hpp 29e8f49 
  src/master/master.cpp e688b41 
  src/tests/master_tests.cpp 9de2424 

Diff: https://reviews.apache.org/r/22066/diff/


Testing
-------

Added one more test after Tim's patch to test offer is not rescinded once it has been declined.

make check


Thanks,

Kapil Arya


Re: Review Request 22066: Added timeout to rescind unused offers

Posted by Kapil Arya <ka...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22066/
-----------------------------------------------------------

(Updated Aug. 8, 2014, 8:55 p.m.)


Review request for mesos, Adam B, Ben Mahler, and Niklas Nielsen.


Bugs: mesos-186
    https://issues.apache.org/jira/browse/mesos-186


Repository: mesos-git


Description
-------

A timer is associated with each newly created offer.
The offer is rescinded on timeout.
The timer is disarmed on a launchTask or decline.


Diffs
-----

  src/master/flags.hpp 0db4c95 
  src/master/master.hpp 29e8f49 
  src/master/master.cpp e688b41 
  src/tests/master_tests.cpp 9de2424 

Diff: https://reviews.apache.org/r/22066/diff/


Testing (updated)
-------

Added one more test after Tim's patch to test offer is not rescinded once it has been declined.

make check


Thanks,

Kapil Arya


Re: Review Request 22066: Added timeout to rescind unused offers

Posted by Kapil Arya <ka...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22066/
-----------------------------------------------------------

(Updated Aug. 8, 2014, 8:52 p.m.)


Review request for mesos, Adam B, Ben Mahler, and Niklas Nielsen.


Bugs: mesos-186
    https://issues.apache.org/jira/browse/mesos-186


Repository: mesos-git


Description
-------

A timer is associated with each newly created offer.
The offer is rescinded on timeout.
The timer is disarmed on a launchTask or decline.


Diffs
-----

  src/master/flags.hpp 0db4c95 
  src/master/master.hpp 29e8f49 
  src/master/master.cpp e688b41 
  src/tests/master_tests.cpp 9de2424 

Diff: https://reviews.apache.org/r/22066/diff/


Testing
-------

Added MasterTest.RescindResourceOffers and ran make check.


Thanks,

Kapil Arya


Re: Review Request 22066: Added timeout to rescind unused offers

Posted by Kapil Arya <ka...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22066/
-----------------------------------------------------------

(Updated Aug. 8, 2014, 8:51 p.m.)


Review request for mesos, Adam B and Niklas Nielsen.


Changes
-------

Addressed Ben's comments/concerns from https://reviews.apache.org/r/22796.


Bugs: mesos-186
    https://issues.apache.org/jira/browse/mesos-186


Repository: mesos-git


Description
-------

A timer is associated with each newly created offer.
The offer is rescinded on timeout.
The timer is disarmed on a launchTask or decline.


Diffs (updated)
-----

  src/master/flags.hpp 0db4c95 
  src/master/master.hpp 29e8f49 
  src/master/master.cpp e688b41 
  src/tests/master_tests.cpp 9de2424 

Diff: https://reviews.apache.org/r/22066/diff/


Testing
-------

Added MasterTest.RescindResourceOffers and ran make check.


Thanks,

Kapil Arya