You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@aurora.apache.org by Maxim Khutornenko <ma...@apache.org> on 2015/02/04 03:03:59 UTC

Re: Review Request 30225: Modifying update controller to support heartbeats.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/
-----------------------------------------------------------

(Updated Feb. 4, 2015, 2:03 a.m.)


Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.


Changes
-------

Added "blocked" persistent states to track pulse history


Bugs: AURORA-1010
    https://issues.apache.org/jira/browse/AURORA-1010


Repository: aurora


Description
-------

Added pulsing support into the JobUpdateController. The qualified coordinated updates get blocked until a pulse arrives. An update then becomes active and proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a terminal state (whichever comes first).

Not particularly happy with plumbing through OneWayJobUpdater but the alternative is a state machine change, which is much hairier and will require more changes in the JobUpdaterController. Going with the minimal diff here.


Diffs (updated)
-----

  api/src/main/thrift/org/apache/aurora/gen/api.thrift 4f603f9e7ed004e53937a45ea8edf7241e15f5cf 
  src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 7f2ec71d744d5fafac84291cc79feba3398d0e1e 
  src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 
  src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java d479d20259f284528b2291e2e486b36d8e47fe5e 
  src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java 60f535988a20638fb16515d5027bfa65f1afb73c 
  src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java d3b30d48b76d8d7c64cda006a34f7ed3296526f2 
  src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java a992938d4e12b20f81608be6bbdc24c0a211c3fd 
  src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 76460f95e058181b711fb6869f5a34c1d5bdfe31 
  src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 
  src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml f9c9ceddc559b43b4a5c45c745d54ff47484edde 
  src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java ca7c0c2675477cc727ca006697665f997972dfde 
  src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java 89765ac3d34a827d3748fb96a275d78e9d1b8b72 
  src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 4c827b183a87b4d97774edbfaa960bd1c3de72a5 
  src/test/java/org/apache/aurora/scheduler/updater/OneWayJobUpdaterTest.java 7d0a7438b4a517e5e0d44f4e99aceb1a6d19f987 

Diff: https://reviews.apache.org/r/30225/diff/


Testing
-------

./gradlew -Pq build


Thanks,

Maxim Khutornenko


Re: Review Request 30225: Modifying update controller to support heartbeats.

Posted by Aurora ReviewBot <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/#review70913
-----------------------------------------------------------


Master (8bcb2ba) is red with this patch.
  ./build-support/jenkins/build.sh

make[4]: Entering directory `/home/jenkins/jenkins-slave/workspace/AuroraBot/build-support/thrift/thrift-0.9.1/tutorial'
../compiler/cpp/thrift --gen html -r ../tutorial/tutorial.thrift
make[4]: Leaving directory `/home/jenkins/jenkins-slave/workspace/AuroraBot/build-support/thrift/thrift-0.9.1/tutorial'
make[3]: Leaving directory `/home/jenkins/jenkins-slave/workspace/AuroraBot/build-support/thrift/thrift-0.9.1/tutorial'
make[3]: Entering directory `/home/jenkins/jenkins-slave/workspace/AuroraBot/build-support/thrift/thrift-0.9.1'
make[3]: Leaving directory `/home/jenkins/jenkins-slave/workspace/AuroraBot/build-support/thrift/thrift-0.9.1'
make[2]: Leaving directory `/home/jenkins/jenkins-slave/workspace/AuroraBot/build-support/thrift/thrift-0.9.1'
make[1]: Leaving directory `/home/jenkins/jenkins-slave/workspace/AuroraBot/build-support/thrift/thrift-0.9.1'
make: Leaving directory `/home/jenkins/jenkins-slave/workspace/AuroraBot/build-support/thrift'
:api:classesThriftNote: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.

:api:checkPython
:api:generateThriftEntitiesJava
:api:classesThriftEntities
:api:compileJava UP-TO-DATE
:api:generateThriftResources
:api:processResources UP-TO-DATE
:api:classes
:api:jar
:compileJavaNote: Writing file:/home/jenkins/jenkins-slave/workspace/AuroraBot/dist/classes/main/com/twitter/common/args/apt/cmdline.arg.info.txt.2
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java:166: error: cannot find symbol
            storeProvider.getJobUpdateStore().fetchJobUpdateSummaries(queryByJob(job));
                                                                      ^
  symbol: method queryByJob(IJobKey)
1 error
 FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':compileJava'.
> Compilation failed; see the compiler error output for details.

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output.

BUILD FAILED

Total time: 1 mins 19.808 secs


I will refresh this build result if you post a review containing "@ReviewBot retry"

- Aurora ReviewBot


On Feb. 4, 2015, 2:29 a.m., Maxim Khutornenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30225/
> -----------------------------------------------------------
> 
> (Updated Feb. 4, 2015, 2:29 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.
> 
> 
> Bugs: AURORA-1010
>     https://issues.apache.org/jira/browse/AURORA-1010
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Added pulsing support into the JobUpdateController. The qualified coordinated updates get blocked until a pulse arrives. An update then becomes active and proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a terminal state (whichever comes first).
> 
> Not particularly happy with plumbing through OneWayJobUpdater but the alternative is a state machine change, which is much hairier and will require more changes in the JobUpdaterController. Going with the minimal diff here.
> 
> 
> Diffs
> -----
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 4f603f9e7ed004e53937a45ea8edf7241e15f5cf 
>   src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 7f2ec71d744d5fafac84291cc79feba3398d0e1e 
>   src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 
>   src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java d479d20259f284528b2291e2e486b36d8e47fe5e 
>   src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java 60f535988a20638fb16515d5027bfa65f1afb73c 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java d3b30d48b76d8d7c64cda006a34f7ed3296526f2 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java ec9d1e07abca1bf30262e1c0f741a34741e96f5d 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 76460f95e058181b711fb6869f5a34c1d5bdfe31 
>   src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 
>   src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml f9c9ceddc559b43b4a5c45c745d54ff47484edde 
>   src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java ca7c0c2675477cc727ca006697665f997972dfde 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java 89765ac3d34a827d3748fb96a275d78e9d1b8b72 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 41d422939209d0808235128e4242c11e8ef25969 
>   src/test/java/org/apache/aurora/scheduler/updater/OneWayJobUpdaterTest.java 7d0a7438b4a517e5e0d44f4e99aceb1a6d19f987 
> 
> Diff: https://reviews.apache.org/r/30225/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew -Pq build
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>


Re: Review Request 30225: Modifying update controller to support heartbeats.

Posted by Maxim Khutornenko <ma...@apache.org>.

> On Feb. 5, 2015, 1:15 a.m., David McLaughlin wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java, line 293
> > <https://reviews.apache.org/r/30225/diff/4/?file=848246#file848246line293>
> >
> >     I think we still want to update the last pulse time even if it's paused?

Makes sense. A paused update is still technically active. Changed to support PAUSED pulse.


- Maxim


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/#review71114
-----------------------------------------------------------


On Feb. 4, 2015, 5:24 p.m., Maxim Khutornenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30225/
> -----------------------------------------------------------
> 
> (Updated Feb. 4, 2015, 5:24 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.
> 
> 
> Bugs: AURORA-1010
>     https://issues.apache.org/jira/browse/AURORA-1010
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Added pulsing support into the JobUpdateController. The qualified coordinated updates get blocked until a pulse arrives. An update then becomes active and proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a terminal state (whichever comes first).
> 
> Not particularly happy with plumbing through OneWayJobUpdater but the alternative is a state machine change, which is much hairier and will require more changes in the JobUpdaterController. Going with the minimal diff here.
> 
> 
> Diffs
> -----
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 4f603f9e7ed004e53937a45ea8edf7241e15f5cf 
>   src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 7f2ec71d744d5fafac84291cc79feba3398d0e1e 
>   src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 
>   src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java d479d20259f284528b2291e2e486b36d8e47fe5e 
>   src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java 60f535988a20638fb16515d5027bfa65f1afb73c 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java d3b30d48b76d8d7c64cda006a34f7ed3296526f2 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java ec9d1e07abca1bf30262e1c0f741a34741e96f5d 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 76460f95e058181b711fb6869f5a34c1d5bdfe31 
>   src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 
>   src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml f9c9ceddc559b43b4a5c45c745d54ff47484edde 
>   src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java ca7c0c2675477cc727ca006697665f997972dfde 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java 89765ac3d34a827d3748fb96a275d78e9d1b8b72 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 41d422939209d0808235128e4242c11e8ef25969 
>   src/test/java/org/apache/aurora/scheduler/updater/OneWayJobUpdaterTest.java 7d0a7438b4a517e5e0d44f4e99aceb1a6d19f987 
> 
> Diff: https://reviews.apache.org/r/30225/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew -Pq build
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>


Re: Review Request 30225: Modifying update controller to support heartbeats.

Posted by David McLaughlin <da...@dmclaughlin.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/#review71114
-----------------------------------------------------------



src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
<https://reviews.apache.org/r/30225/#comment116725>

    I think we still want to update the last pulse time even if it's paused?


- David McLaughlin


On Feb. 4, 2015, 5:24 p.m., Maxim Khutornenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30225/
> -----------------------------------------------------------
> 
> (Updated Feb. 4, 2015, 5:24 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.
> 
> 
> Bugs: AURORA-1010
>     https://issues.apache.org/jira/browse/AURORA-1010
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Added pulsing support into the JobUpdateController. The qualified coordinated updates get blocked until a pulse arrives. An update then becomes active and proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a terminal state (whichever comes first).
> 
> Not particularly happy with plumbing through OneWayJobUpdater but the alternative is a state machine change, which is much hairier and will require more changes in the JobUpdaterController. Going with the minimal diff here.
> 
> 
> Diffs
> -----
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 4f603f9e7ed004e53937a45ea8edf7241e15f5cf 
>   src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 7f2ec71d744d5fafac84291cc79feba3398d0e1e 
>   src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 
>   src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java d479d20259f284528b2291e2e486b36d8e47fe5e 
>   src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java 60f535988a20638fb16515d5027bfa65f1afb73c 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java d3b30d48b76d8d7c64cda006a34f7ed3296526f2 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java ec9d1e07abca1bf30262e1c0f741a34741e96f5d 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 76460f95e058181b711fb6869f5a34c1d5bdfe31 
>   src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 
>   src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml f9c9ceddc559b43b4a5c45c745d54ff47484edde 
>   src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java ca7c0c2675477cc727ca006697665f997972dfde 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java 89765ac3d34a827d3748fb96a275d78e9d1b8b72 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 41d422939209d0808235128e4242c11e8ef25969 
>   src/test/java/org/apache/aurora/scheduler/updater/OneWayJobUpdaterTest.java 7d0a7438b4a517e5e0d44f4e99aceb1a6d19f987 
> 
> Diff: https://reviews.apache.org/r/30225/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew -Pq build
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>


Re: Review Request 30225: Modifying update controller to support heartbeats.

Posted by Maxim Khutornenko <ma...@apache.org>.

> On Feb. 4, 2015, 11:34 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java, line 273
> > <https://reviews.apache.org/r/30225/diff/4/?file=848246#file848246line273>
> >
> >     I think avoid acquisition of a write lock here is a good goal to aim for.  If, for example, we are already holding the write lock for a large snapshot, we could cause it to appear as though we have not received a pulse for an extended duration.
> >     
> >     Also, given that there is likely to be an automated system on the other end, we should be somewhat defensive to a high call frequency.  This suggests to me that potentially-expensive read operations should be avoided if possible.
> >     
> >     For these reasons, i think it would be wise to decouple recording of pulses from reacting to them.  It's also apparent that there is currently no asynchronous transition to the 'blocked' states, they're triggered either by a tardy pulse or another external action.  I think this would be surprising behavior, and the scheduler should automatically transition to these states without any external input.

"I think avoid acquisition of a write lock here is a good goal to aim for.  If, for example, we are already holding the write lock for a large snapshot, we could cause it to appear as though we have not received a pulse for an extended duration." - makes sense, I can try to split the recording part from the status update via an async action. 

"Also, given that there is likely to be an automated system on the other end, we should be somewhat defensive to a high call frequency.  This suggests to me that potentially-expensive read operations should be avoided if possible." - I don't see how this would be possible without breaking the RPC contract as I still have to pull JobUpdateDetails in order to respond to the pulse with proper status message (FINISHED|PAUSED|OK).


"It's also apparent that there is currently no asynchronous transition to the 'blocked' states, they're triggered either by a tardy pulse or another external action.  I think this would be surprising behavior, and the scheduler should automatically transition to these states without any external input." - the transition to BLOCKED state happens during the evaluation loop similar to other internally triggered state transitions (FAILED|ERROR|ROLL_FORWARD_COMPLETED and etc.). Transitions to these states do not happen instantenously and are triggered by instance activities (i.e. task state change events). I don't see why BLOCKED state should be any different. The update is not technically alive until the evaluation event arrives. Changing this behavior just for heartbeats is not worth the extra complexity and will be inconsistent with the rest of the feature.


- Maxim


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/#review71063
-----------------------------------------------------------


On Feb. 4, 2015, 5:24 p.m., Maxim Khutornenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30225/
> -----------------------------------------------------------
> 
> (Updated Feb. 4, 2015, 5:24 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.
> 
> 
> Bugs: AURORA-1010
>     https://issues.apache.org/jira/browse/AURORA-1010
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Added pulsing support into the JobUpdateController. The qualified coordinated updates get blocked until a pulse arrives. An update then becomes active and proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a terminal state (whichever comes first).
> 
> Not particularly happy with plumbing through OneWayJobUpdater but the alternative is a state machine change, which is much hairier and will require more changes in the JobUpdaterController. Going with the minimal diff here.
> 
> 
> Diffs
> -----
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 4f603f9e7ed004e53937a45ea8edf7241e15f5cf 
>   src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 7f2ec71d744d5fafac84291cc79feba3398d0e1e 
>   src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 
>   src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java d479d20259f284528b2291e2e486b36d8e47fe5e 
>   src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java 60f535988a20638fb16515d5027bfa65f1afb73c 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java d3b30d48b76d8d7c64cda006a34f7ed3296526f2 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java ec9d1e07abca1bf30262e1c0f741a34741e96f5d 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 76460f95e058181b711fb6869f5a34c1d5bdfe31 
>   src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 
>   src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml f9c9ceddc559b43b4a5c45c745d54ff47484edde 
>   src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java ca7c0c2675477cc727ca006697665f997972dfde 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java 89765ac3d34a827d3748fb96a275d78e9d1b8b72 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 41d422939209d0808235128e4242c11e8ef25969 
>   src/test/java/org/apache/aurora/scheduler/updater/OneWayJobUpdaterTest.java 7d0a7438b4a517e5e0d44f4e99aceb1a6d19f987 
> 
> Diff: https://reviews.apache.org/r/30225/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew -Pq build
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>


Re: Review Request 30225: Modifying update controller to support heartbeats.

Posted by Maxim Khutornenko <ma...@apache.org>.

> On Feb. 4, 2015, 11:34 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java, line 273
> > <https://reviews.apache.org/r/30225/diff/4/?file=848246#file848246line273>
> >
> >     I think avoid acquisition of a write lock here is a good goal to aim for.  If, for example, we are already holding the write lock for a large snapshot, we could cause it to appear as though we have not received a pulse for an extended duration.
> >     
> >     Also, given that there is likely to be an automated system on the other end, we should be somewhat defensive to a high call frequency.  This suggests to me that potentially-expensive read operations should be avoided if possible.
> >     
> >     For these reasons, i think it would be wise to decouple recording of pulses from reacting to them.  It's also apparent that there is currently no asynchronous transition to the 'blocked' states, they're triggered either by a tardy pulse or another external action.  I think this would be surprising behavior, and the scheduler should automatically transition to these states without any external input.
> 
> Maxim Khutornenko wrote:
>     "I think avoid acquisition of a write lock here is a good goal to aim for.  If, for example, we are already holding the write lock for a large snapshot, we could cause it to appear as though we have not received a pulse for an extended duration." - makes sense, I can try to split the recording part from the status update via an async action. 
>     
>     "Also, given that there is likely to be an automated system on the other end, we should be somewhat defensive to a high call frequency.  This suggests to me that potentially-expensive read operations should be avoided if possible." - I don't see how this would be possible without breaking the RPC contract as I still have to pull JobUpdateDetails in order to respond to the pulse with proper status message (FINISHED|PAUSED|OK).
>     
>     
>     "It's also apparent that there is currently no asynchronous transition to the 'blocked' states, they're triggered either by a tardy pulse or another external action.  I think this would be surprising behavior, and the scheduler should automatically transition to these states without any external input." - the transition to BLOCKED state happens during the evaluation loop similar to other internally triggered state transitions (FAILED|ERROR|ROLL_FORWARD_COMPLETED and etc.). Transitions to these states do not happen instantenously and are triggered by instance activities (i.e. task state change events). I don't see why BLOCKED state should be any different. The update is not technically alive until the evaluation event arrives. Changing this behavior just for heartbeats is not worth the extra complexity and will be inconsistent with the rest of the feature.
> 
> Bill Farner wrote:
>     > I don't see how this would be possible without breaking the RPC contract as I still have to pull JobUpdateDetails in order to respond to the pulse with proper status message (FINISHED|PAUSED|OK)
>     
>     Yeah, this crossed my mind.  Perhaps we should poke at the API to see if we can shake those requirements out?  For example, we could just return OK/UPDATE_NOT_ACTIVE, and the caller should follow up with a query to differentiate between a paused and finished update?
>     
>     I don't feel really strongly about this, but i think we should feel free to question the requirements at this phase.
>     
>     > the transition to BLOCKED state happens during the evaluation loop similar to other internally triggered state transitions (FAILED|ERROR|ROLL_FORWARD_COMPLETED and etc.). Transitions to these states do not happen instantenously and are triggered by instance activities (i.e. task state change events). I don't see why BLOCKED state should be any different. The update is not technically alive until the evaluation event arrives. Changing this behavior just for heartbeats is not worth the extra complexity and will be inconsistent with the rest of the feature
>     
>     Transitions do indeed happen asynchronously when dealing with timeouts (relevant code is in `InstanceActionHandler`).  These fall back to the same loop, but give the updater a time after which the state should be re-evaluated.

>Yeah, this crossed my mind.  Perhaps we should poke at the API to see if we can shake those requirements out?  For example, we could just return OK/UPDATE_NOT_ACTIVE, and the caller should follow up with a query to differentiate between a paused and finished update? I don't feel really strongly about this, but i think we should feel free to question the requirements at this phase.

I don't think this would buy us much if external service starts following up each pulse that returns UPDATE_NOT_ACTIVE. It would just makes the contract harder to consume. If we are concerned about obuse I think having a rate limiter would make more sense here. Also, I still need to pull instructions to determine if the pulse is legit (i.e. the update is actually a coordinated update). Otherwise, pulse goes nowhere and external service continues to pulse the wrong update.


- Maxim


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/#review71063
-----------------------------------------------------------


On Feb. 4, 2015, 5:24 p.m., Maxim Khutornenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30225/
> -----------------------------------------------------------
> 
> (Updated Feb. 4, 2015, 5:24 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.
> 
> 
> Bugs: AURORA-1010
>     https://issues.apache.org/jira/browse/AURORA-1010
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Added pulsing support into the JobUpdateController. The qualified coordinated updates get blocked until a pulse arrives. An update then becomes active and proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a terminal state (whichever comes first).
> 
> Not particularly happy with plumbing through OneWayJobUpdater but the alternative is a state machine change, which is much hairier and will require more changes in the JobUpdaterController. Going with the minimal diff here.
> 
> 
> Diffs
> -----
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 4f603f9e7ed004e53937a45ea8edf7241e15f5cf 
>   src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 7f2ec71d744d5fafac84291cc79feba3398d0e1e 
>   src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 
>   src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java d479d20259f284528b2291e2e486b36d8e47fe5e 
>   src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java 60f535988a20638fb16515d5027bfa65f1afb73c 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java d3b30d48b76d8d7c64cda006a34f7ed3296526f2 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java ec9d1e07abca1bf30262e1c0f741a34741e96f5d 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 76460f95e058181b711fb6869f5a34c1d5bdfe31 
>   src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 
>   src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml f9c9ceddc559b43b4a5c45c745d54ff47484edde 
>   src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java ca7c0c2675477cc727ca006697665f997972dfde 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java 89765ac3d34a827d3748fb96a275d78e9d1b8b72 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 41d422939209d0808235128e4242c11e8ef25969 
>   src/test/java/org/apache/aurora/scheduler/updater/OneWayJobUpdaterTest.java 7d0a7438b4a517e5e0d44f4e99aceb1a6d19f987 
> 
> Diff: https://reviews.apache.org/r/30225/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew -Pq build
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>


Re: Review Request 30225: Modifying update controller to support heartbeats.

Posted by David McLaughlin <da...@dmclaughlin.com>.

> On Feb. 4, 2015, 11:34 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java, line 273
> > <https://reviews.apache.org/r/30225/diff/4/?file=848246#file848246line273>
> >
> >     I think avoid acquisition of a write lock here is a good goal to aim for.  If, for example, we are already holding the write lock for a large snapshot, we could cause it to appear as though we have not received a pulse for an extended duration.
> >     
> >     Also, given that there is likely to be an automated system on the other end, we should be somewhat defensive to a high call frequency.  This suggests to me that potentially-expensive read operations should be avoided if possible.
> >     
> >     For these reasons, i think it would be wise to decouple recording of pulses from reacting to them.  It's also apparent that there is currently no asynchronous transition to the 'blocked' states, they're triggered either by a tardy pulse or another external action.  I think this would be surprising behavior, and the scheduler should automatically transition to these states without any external input.
> 
> Maxim Khutornenko wrote:
>     "I think avoid acquisition of a write lock here is a good goal to aim for.  If, for example, we are already holding the write lock for a large snapshot, we could cause it to appear as though we have not received a pulse for an extended duration." - makes sense, I can try to split the recording part from the status update via an async action. 
>     
>     "Also, given that there is likely to be an automated system on the other end, we should be somewhat defensive to a high call frequency.  This suggests to me that potentially-expensive read operations should be avoided if possible." - I don't see how this would be possible without breaking the RPC contract as I still have to pull JobUpdateDetails in order to respond to the pulse with proper status message (FINISHED|PAUSED|OK).
>     
>     
>     "It's also apparent that there is currently no asynchronous transition to the 'blocked' states, they're triggered either by a tardy pulse or another external action.  I think this would be surprising behavior, and the scheduler should automatically transition to these states without any external input." - the transition to BLOCKED state happens during the evaluation loop similar to other internally triggered state transitions (FAILED|ERROR|ROLL_FORWARD_COMPLETED and etc.). Transitions to these states do not happen instantenously and are triggered by instance activities (i.e. task state change events). I don't see why BLOCKED state should be any different. The update is not technically alive until the evaluation event arrives. Changing this behavior just for heartbeats is not worth the extra complexity and will be inconsistent with the rest of the feature.
> 
> Bill Farner wrote:
>     > I don't see how this would be possible without breaking the RPC contract as I still have to pull JobUpdateDetails in order to respond to the pulse with proper status message (FINISHED|PAUSED|OK)
>     
>     Yeah, this crossed my mind.  Perhaps we should poke at the API to see if we can shake those requirements out?  For example, we could just return OK/UPDATE_NOT_ACTIVE, and the caller should follow up with a query to differentiate between a paused and finished update?
>     
>     I don't feel really strongly about this, but i think we should feel free to question the requirements at this phase.
>     
>     > the transition to BLOCKED state happens during the evaluation loop similar to other internally triggered state transitions (FAILED|ERROR|ROLL_FORWARD_COMPLETED and etc.). Transitions to these states do not happen instantenously and are triggered by instance activities (i.e. task state change events). I don't see why BLOCKED state should be any different. The update is not technically alive until the evaluation event arrives. Changing this behavior just for heartbeats is not worth the extra complexity and will be inconsistent with the rest of the feature
>     
>     Transitions do indeed happen asynchronously when dealing with timeouts (relevant code is in `InstanceActionHandler`).  These fall back to the same loop, but give the updater a time after which the state should be re-evaluated.
> 
> Maxim Khutornenko wrote:
>     >Yeah, this crossed my mind.  Perhaps we should poke at the API to see if we can shake those requirements out?  For example, we could just return OK/UPDATE_NOT_ACTIVE, and the caller should follow up with a query to differentiate between a paused and finished update? I don't feel really strongly about this, but i think we should feel free to question the requirements at this phase.
>     
>     I don't think this would buy us much if external service starts following up each pulse that returns UPDATE_NOT_ACTIVE. It would just makes the contract harder to consume. If we are concerned about obuse I think having a rate limiter would make more sense here. Also, I still need to pull instructions to determine if the pulse is legit (i.e. the update is actually a coordinated update). Otherwise, pulse goes nowhere and external service continues to pulse the wrong update.
> 
> Bill Farner wrote:
>     > Also, I still need to pull instructions to determine if the pulse is legit (i.e. the update is actually a coordinated update). Otherwise, pulse goes nowhere and external service continues to pulse the wrong update.
>     
>     That's not _necessarily_ required, you could contain it in a `Runnable` implementation that is responsible for the async transition.
> 
> Maxim Khutornenko wrote:
>     Not sure I get it. If it's in async part that means the pulse response is already gone and the external service is oblivious what it actually pulsed.

For me the logic for pulse is fine, it's just the acquisition of the write lock that is a little off. It should be this:

    pulse(JobKey key, String updateId) { 
       // fetch the update status
       // if the update is terminal 
          // return GO_AWAY
       // if the update is non-terminal
          // update the pulse
       
       // if the update is currently blocked
          // acquire write lock
             // change status
    }
    
    
Give or take some operations. You could remove the fetching of the update status, but then you're just asking the client to start polling the scheduler every pulse instead to make the same decision.


- David


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/#review71063
-----------------------------------------------------------


On Feb. 4, 2015, 5:24 p.m., Maxim Khutornenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30225/
> -----------------------------------------------------------
> 
> (Updated Feb. 4, 2015, 5:24 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.
> 
> 
> Bugs: AURORA-1010
>     https://issues.apache.org/jira/browse/AURORA-1010
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Added pulsing support into the JobUpdateController. The qualified coordinated updates get blocked until a pulse arrives. An update then becomes active and proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a terminal state (whichever comes first).
> 
> Not particularly happy with plumbing through OneWayJobUpdater but the alternative is a state machine change, which is much hairier and will require more changes in the JobUpdaterController. Going with the minimal diff here.
> 
> 
> Diffs
> -----
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 4f603f9e7ed004e53937a45ea8edf7241e15f5cf 
>   src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 7f2ec71d744d5fafac84291cc79feba3398d0e1e 
>   src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 
>   src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java d479d20259f284528b2291e2e486b36d8e47fe5e 
>   src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java 60f535988a20638fb16515d5027bfa65f1afb73c 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java d3b30d48b76d8d7c64cda006a34f7ed3296526f2 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java ec9d1e07abca1bf30262e1c0f741a34741e96f5d 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 76460f95e058181b711fb6869f5a34c1d5bdfe31 
>   src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 
>   src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml f9c9ceddc559b43b4a5c45c745d54ff47484edde 
>   src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java ca7c0c2675477cc727ca006697665f997972dfde 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java 89765ac3d34a827d3748fb96a275d78e9d1b8b72 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 41d422939209d0808235128e4242c11e8ef25969 
>   src/test/java/org/apache/aurora/scheduler/updater/OneWayJobUpdaterTest.java 7d0a7438b4a517e5e0d44f4e99aceb1a6d19f987 
> 
> Diff: https://reviews.apache.org/r/30225/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew -Pq build
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>


Re: Review Request 30225: Modifying update controller to support heartbeats.

Posted by Maxim Khutornenko <ma...@apache.org>.

> On Feb. 4, 2015, 11:34 p.m., Bill Farner wrote:
> > api/src/main/thrift/org/apache/aurora/gen/api.thrift, line 565
> > <https://reviews.apache.org/r/30225/diff/4/?file=848240#file848240line565>
> >
> >     Maybe s/BLOCKED/AWAITING_PULSE/?
> >     
> >     That would at least self-document and avoid additional terminology.

Renamed.


> On Feb. 4, 2015, 11:34 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java, line 49
> > <https://reviews.apache.org/r/30225/diff/4/?file=848242#file848242line49>
> >
> >     s/query/fetch/ to be consistent with the method above.

I started with "fetch" but that resulted in an overloaded method that I did not quite liked. Changed it back.


> On Feb. 4, 2015, 11:34 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java, line 106
> > <https://reviews.apache.org/r/30225/diff/4/?file=848246#file848246line106>
> >
> >     Remove Optional here, do Optional.of() at the call site.

Removed.


> On Feb. 4, 2015, 11:34 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java, line 122
> > <https://reviews.apache.org/r/30225/diff/4/?file=848246#file848246line122>
> >
> >     TODO + ticket, this map needs to be exposed via a debug endpoint

Done.


> On Feb. 4, 2015, 11:34 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java, line 273
> > <https://reviews.apache.org/r/30225/diff/4/?file=848246#file848246line273>
> >
> >     I think avoid acquisition of a write lock here is a good goal to aim for.  If, for example, we are already holding the write lock for a large snapshot, we could cause it to appear as though we have not received a pulse for an extended duration.
> >     
> >     Also, given that there is likely to be an automated system on the other end, we should be somewhat defensive to a high call frequency.  This suggests to me that potentially-expensive read operations should be avoided if possible.
> >     
> >     For these reasons, i think it would be wise to decouple recording of pulses from reacting to them.  It's also apparent that there is currently no asynchronous transition to the 'blocked' states, they're triggered either by a tardy pulse or another external action.  I think this would be surprising behavior, and the scheduler should automatically transition to these states without any external input.
> 
> Maxim Khutornenko wrote:
>     "I think avoid acquisition of a write lock here is a good goal to aim for.  If, for example, we are already holding the write lock for a large snapshot, we could cause it to appear as though we have not received a pulse for an extended duration." - makes sense, I can try to split the recording part from the status update via an async action. 
>     
>     "Also, given that there is likely to be an automated system on the other end, we should be somewhat defensive to a high call frequency.  This suggests to me that potentially-expensive read operations should be avoided if possible." - I don't see how this would be possible without breaking the RPC contract as I still have to pull JobUpdateDetails in order to respond to the pulse with proper status message (FINISHED|PAUSED|OK).
>     
>     
>     "It's also apparent that there is currently no asynchronous transition to the 'blocked' states, they're triggered either by a tardy pulse or another external action.  I think this would be surprising behavior, and the scheduler should automatically transition to these states without any external input." - the transition to BLOCKED state happens during the evaluation loop similar to other internally triggered state transitions (FAILED|ERROR|ROLL_FORWARD_COMPLETED and etc.). Transitions to these states do not happen instantenously and are triggered by instance activities (i.e. task state change events). I don't see why BLOCKED state should be any different. The update is not technically alive until the evaluation event arrives. Changing this behavior just for heartbeats is not worth the extra complexity and will be inconsistent with the rest of the feature.
> 
> Bill Farner wrote:
>     > I don't see how this would be possible without breaking the RPC contract as I still have to pull JobUpdateDetails in order to respond to the pulse with proper status message (FINISHED|PAUSED|OK)
>     
>     Yeah, this crossed my mind.  Perhaps we should poke at the API to see if we can shake those requirements out?  For example, we could just return OK/UPDATE_NOT_ACTIVE, and the caller should follow up with a query to differentiate between a paused and finished update?
>     
>     I don't feel really strongly about this, but i think we should feel free to question the requirements at this phase.
>     
>     > the transition to BLOCKED state happens during the evaluation loop similar to other internally triggered state transitions (FAILED|ERROR|ROLL_FORWARD_COMPLETED and etc.). Transitions to these states do not happen instantenously and are triggered by instance activities (i.e. task state change events). I don't see why BLOCKED state should be any different. The update is not technically alive until the evaluation event arrives. Changing this behavior just for heartbeats is not worth the extra complexity and will be inconsistent with the rest of the feature
>     
>     Transitions do indeed happen asynchronously when dealing with timeouts (relevant code is in `InstanceActionHandler`).  These fall back to the same loop, but give the updater a time after which the state should be re-evaluated.
> 
> Maxim Khutornenko wrote:
>     >Yeah, this crossed my mind.  Perhaps we should poke at the API to see if we can shake those requirements out?  For example, we could just return OK/UPDATE_NOT_ACTIVE, and the caller should follow up with a query to differentiate between a paused and finished update? I don't feel really strongly about this, but i think we should feel free to question the requirements at this phase.
>     
>     I don't think this would buy us much if external service starts following up each pulse that returns UPDATE_NOT_ACTIVE. It would just makes the contract harder to consume. If we are concerned about obuse I think having a rate limiter would make more sense here. Also, I still need to pull instructions to determine if the pulse is legit (i.e. the update is actually a coordinated update). Otherwise, pulse goes nowhere and external service continues to pulse the wrong update.
> 
> Bill Farner wrote:
>     > Also, I still need to pull instructions to determine if the pulse is legit (i.e. the update is actually a coordinated update). Otherwise, pulse goes nowhere and external service continues to pulse the wrong update.
>     
>     That's not _necessarily_ required, you could contain it in a `Runnable` implementation that is responsible for the async transition.
> 
> Maxim Khutornenko wrote:
>     Not sure I get it. If it's in async part that means the pulse response is already gone and the external service is oblivious what it actually pulsed.
> 
> David McLaughlin wrote:
>     For me the logic for pulse is fine, it's just the acquisition of the write lock that is a little off. It should be this:
>     
>         pulse(JobKey key, String updateId) { 
>            // fetch the update status
>            // if the update is terminal 
>               // return GO_AWAY
>            // if the update is non-terminal
>               // update the pulse
>            
>            // if the update is currently blocked
>               // acquire write lock
>                  // change status
>         }
>         
>         
>     Give or take some operations. You could remove the fetching of the update status, but then you're just asking the client to start polling the scheduler every pulse instead to make the same decision.

Moved to async action to avoid delaying pulse response.


> On Feb. 4, 2015, 11:34 p.m., Bill Farner wrote:
> > src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml, line 301
> > <https://reviews.apache.org/r/30225/diff/4/?file=848249#file848249line301>
> >
> >     is it possible to reduce the diff a bit by moving this block to its former location, or is it needed

This is diff playing tricks. I moved the filter code out to reuse it with details select.


- Maxim


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/#review71063
-----------------------------------------------------------


On Feb. 4, 2015, 5:24 p.m., Maxim Khutornenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30225/
> -----------------------------------------------------------
> 
> (Updated Feb. 4, 2015, 5:24 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.
> 
> 
> Bugs: AURORA-1010
>     https://issues.apache.org/jira/browse/AURORA-1010
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Added pulsing support into the JobUpdateController. The qualified coordinated updates get blocked until a pulse arrives. An update then becomes active and proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a terminal state (whichever comes first).
> 
> Not particularly happy with plumbing through OneWayJobUpdater but the alternative is a state machine change, which is much hairier and will require more changes in the JobUpdaterController. Going with the minimal diff here.
> 
> 
> Diffs
> -----
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 4f603f9e7ed004e53937a45ea8edf7241e15f5cf 
>   src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 7f2ec71d744d5fafac84291cc79feba3398d0e1e 
>   src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 
>   src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java d479d20259f284528b2291e2e486b36d8e47fe5e 
>   src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java 60f535988a20638fb16515d5027bfa65f1afb73c 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java d3b30d48b76d8d7c64cda006a34f7ed3296526f2 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java ec9d1e07abca1bf30262e1c0f741a34741e96f5d 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 76460f95e058181b711fb6869f5a34c1d5bdfe31 
>   src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 
>   src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml f9c9ceddc559b43b4a5c45c745d54ff47484edde 
>   src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java ca7c0c2675477cc727ca006697665f997972dfde 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java 89765ac3d34a827d3748fb96a275d78e9d1b8b72 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 41d422939209d0808235128e4242c11e8ef25969 
>   src/test/java/org/apache/aurora/scheduler/updater/OneWayJobUpdaterTest.java 7d0a7438b4a517e5e0d44f4e99aceb1a6d19f987 
> 
> Diff: https://reviews.apache.org/r/30225/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew -Pq build
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>


Re: Review Request 30225: Modifying update controller to support heartbeats.

Posted by Bill Farner <wf...@apache.org>.

> On Feb. 4, 2015, 11:34 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java, line 273
> > <https://reviews.apache.org/r/30225/diff/4/?file=848246#file848246line273>
> >
> >     I think avoid acquisition of a write lock here is a good goal to aim for.  If, for example, we are already holding the write lock for a large snapshot, we could cause it to appear as though we have not received a pulse for an extended duration.
> >     
> >     Also, given that there is likely to be an automated system on the other end, we should be somewhat defensive to a high call frequency.  This suggests to me that potentially-expensive read operations should be avoided if possible.
> >     
> >     For these reasons, i think it would be wise to decouple recording of pulses from reacting to them.  It's also apparent that there is currently no asynchronous transition to the 'blocked' states, they're triggered either by a tardy pulse or another external action.  I think this would be surprising behavior, and the scheduler should automatically transition to these states without any external input.
> 
> Maxim Khutornenko wrote:
>     "I think avoid acquisition of a write lock here is a good goal to aim for.  If, for example, we are already holding the write lock for a large snapshot, we could cause it to appear as though we have not received a pulse for an extended duration." - makes sense, I can try to split the recording part from the status update via an async action. 
>     
>     "Also, given that there is likely to be an automated system on the other end, we should be somewhat defensive to a high call frequency.  This suggests to me that potentially-expensive read operations should be avoided if possible." - I don't see how this would be possible without breaking the RPC contract as I still have to pull JobUpdateDetails in order to respond to the pulse with proper status message (FINISHED|PAUSED|OK).
>     
>     
>     "It's also apparent that there is currently no asynchronous transition to the 'blocked' states, they're triggered either by a tardy pulse or another external action.  I think this would be surprising behavior, and the scheduler should automatically transition to these states without any external input." - the transition to BLOCKED state happens during the evaluation loop similar to other internally triggered state transitions (FAILED|ERROR|ROLL_FORWARD_COMPLETED and etc.). Transitions to these states do not happen instantenously and are triggered by instance activities (i.e. task state change events). I don't see why BLOCKED state should be any different. The update is not technically alive until the evaluation event arrives. Changing this behavior just for heartbeats is not worth the extra complexity and will be inconsistent with the rest of the feature.
> 
> Bill Farner wrote:
>     > I don't see how this would be possible without breaking the RPC contract as I still have to pull JobUpdateDetails in order to respond to the pulse with proper status message (FINISHED|PAUSED|OK)
>     
>     Yeah, this crossed my mind.  Perhaps we should poke at the API to see if we can shake those requirements out?  For example, we could just return OK/UPDATE_NOT_ACTIVE, and the caller should follow up with a query to differentiate between a paused and finished update?
>     
>     I don't feel really strongly about this, but i think we should feel free to question the requirements at this phase.
>     
>     > the transition to BLOCKED state happens during the evaluation loop similar to other internally triggered state transitions (FAILED|ERROR|ROLL_FORWARD_COMPLETED and etc.). Transitions to these states do not happen instantenously and are triggered by instance activities (i.e. task state change events). I don't see why BLOCKED state should be any different. The update is not technically alive until the evaluation event arrives. Changing this behavior just for heartbeats is not worth the extra complexity and will be inconsistent with the rest of the feature
>     
>     Transitions do indeed happen asynchronously when dealing with timeouts (relevant code is in `InstanceActionHandler`).  These fall back to the same loop, but give the updater a time after which the state should be re-evaluated.
> 
> Maxim Khutornenko wrote:
>     >Yeah, this crossed my mind.  Perhaps we should poke at the API to see if we can shake those requirements out?  For example, we could just return OK/UPDATE_NOT_ACTIVE, and the caller should follow up with a query to differentiate between a paused and finished update? I don't feel really strongly about this, but i think we should feel free to question the requirements at this phase.
>     
>     I don't think this would buy us much if external service starts following up each pulse that returns UPDATE_NOT_ACTIVE. It would just makes the contract harder to consume. If we are concerned about obuse I think having a rate limiter would make more sense here. Also, I still need to pull instructions to determine if the pulse is legit (i.e. the update is actually a coordinated update). Otherwise, pulse goes nowhere and external service continues to pulse the wrong update.

> Also, I still need to pull instructions to determine if the pulse is legit (i.e. the update is actually a coordinated update). Otherwise, pulse goes nowhere and external service continues to pulse the wrong update.

That's not _necessarily_ required, you could contain it in a `Runnable` implementation that is responsible for the async transition.


- Bill


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/#review71063
-----------------------------------------------------------


On Feb. 4, 2015, 5:24 p.m., Maxim Khutornenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30225/
> -----------------------------------------------------------
> 
> (Updated Feb. 4, 2015, 5:24 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.
> 
> 
> Bugs: AURORA-1010
>     https://issues.apache.org/jira/browse/AURORA-1010
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Added pulsing support into the JobUpdateController. The qualified coordinated updates get blocked until a pulse arrives. An update then becomes active and proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a terminal state (whichever comes first).
> 
> Not particularly happy with plumbing through OneWayJobUpdater but the alternative is a state machine change, which is much hairier and will require more changes in the JobUpdaterController. Going with the minimal diff here.
> 
> 
> Diffs
> -----
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 4f603f9e7ed004e53937a45ea8edf7241e15f5cf 
>   src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 7f2ec71d744d5fafac84291cc79feba3398d0e1e 
>   src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 
>   src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java d479d20259f284528b2291e2e486b36d8e47fe5e 
>   src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java 60f535988a20638fb16515d5027bfa65f1afb73c 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java d3b30d48b76d8d7c64cda006a34f7ed3296526f2 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java ec9d1e07abca1bf30262e1c0f741a34741e96f5d 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 76460f95e058181b711fb6869f5a34c1d5bdfe31 
>   src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 
>   src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml f9c9ceddc559b43b4a5c45c745d54ff47484edde 
>   src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java ca7c0c2675477cc727ca006697665f997972dfde 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java 89765ac3d34a827d3748fb96a275d78e9d1b8b72 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 41d422939209d0808235128e4242c11e8ef25969 
>   src/test/java/org/apache/aurora/scheduler/updater/OneWayJobUpdaterTest.java 7d0a7438b4a517e5e0d44f4e99aceb1a6d19f987 
> 
> Diff: https://reviews.apache.org/r/30225/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew -Pq build
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>


Re: Review Request 30225: Modifying update controller to support heartbeats.

Posted by Maxim Khutornenko <ma...@apache.org>.

> On Feb. 4, 2015, 11:34 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java, line 273
> > <https://reviews.apache.org/r/30225/diff/4/?file=848246#file848246line273>
> >
> >     I think avoid acquisition of a write lock here is a good goal to aim for.  If, for example, we are already holding the write lock for a large snapshot, we could cause it to appear as though we have not received a pulse for an extended duration.
> >     
> >     Also, given that there is likely to be an automated system on the other end, we should be somewhat defensive to a high call frequency.  This suggests to me that potentially-expensive read operations should be avoided if possible.
> >     
> >     For these reasons, i think it would be wise to decouple recording of pulses from reacting to them.  It's also apparent that there is currently no asynchronous transition to the 'blocked' states, they're triggered either by a tardy pulse or another external action.  I think this would be surprising behavior, and the scheduler should automatically transition to these states without any external input.
> 
> Maxim Khutornenko wrote:
>     "I think avoid acquisition of a write lock here is a good goal to aim for.  If, for example, we are already holding the write lock for a large snapshot, we could cause it to appear as though we have not received a pulse for an extended duration." - makes sense, I can try to split the recording part from the status update via an async action. 
>     
>     "Also, given that there is likely to be an automated system on the other end, we should be somewhat defensive to a high call frequency.  This suggests to me that potentially-expensive read operations should be avoided if possible." - I don't see how this would be possible without breaking the RPC contract as I still have to pull JobUpdateDetails in order to respond to the pulse with proper status message (FINISHED|PAUSED|OK).
>     
>     
>     "It's also apparent that there is currently no asynchronous transition to the 'blocked' states, they're triggered either by a tardy pulse or another external action.  I think this would be surprising behavior, and the scheduler should automatically transition to these states without any external input." - the transition to BLOCKED state happens during the evaluation loop similar to other internally triggered state transitions (FAILED|ERROR|ROLL_FORWARD_COMPLETED and etc.). Transitions to these states do not happen instantenously and are triggered by instance activities (i.e. task state change events). I don't see why BLOCKED state should be any different. The update is not technically alive until the evaluation event arrives. Changing this behavior just for heartbeats is not worth the extra complexity and will be inconsistent with the rest of the feature.
> 
> Bill Farner wrote:
>     > I don't see how this would be possible without breaking the RPC contract as I still have to pull JobUpdateDetails in order to respond to the pulse with proper status message (FINISHED|PAUSED|OK)
>     
>     Yeah, this crossed my mind.  Perhaps we should poke at the API to see if we can shake those requirements out?  For example, we could just return OK/UPDATE_NOT_ACTIVE, and the caller should follow up with a query to differentiate between a paused and finished update?
>     
>     I don't feel really strongly about this, but i think we should feel free to question the requirements at this phase.
>     
>     > the transition to BLOCKED state happens during the evaluation loop similar to other internally triggered state transitions (FAILED|ERROR|ROLL_FORWARD_COMPLETED and etc.). Transitions to these states do not happen instantenously and are triggered by instance activities (i.e. task state change events). I don't see why BLOCKED state should be any different. The update is not technically alive until the evaluation event arrives. Changing this behavior just for heartbeats is not worth the extra complexity and will be inconsistent with the rest of the feature
>     
>     Transitions do indeed happen asynchronously when dealing with timeouts (relevant code is in `InstanceActionHandler`).  These fall back to the same loop, but give the updater a time after which the state should be re-evaluated.
> 
> Maxim Khutornenko wrote:
>     >Yeah, this crossed my mind.  Perhaps we should poke at the API to see if we can shake those requirements out?  For example, we could just return OK/UPDATE_NOT_ACTIVE, and the caller should follow up with a query to differentiate between a paused and finished update? I don't feel really strongly about this, but i think we should feel free to question the requirements at this phase.
>     
>     I don't think this would buy us much if external service starts following up each pulse that returns UPDATE_NOT_ACTIVE. It would just makes the contract harder to consume. If we are concerned about obuse I think having a rate limiter would make more sense here. Also, I still need to pull instructions to determine if the pulse is legit (i.e. the update is actually a coordinated update). Otherwise, pulse goes nowhere and external service continues to pulse the wrong update.
> 
> Bill Farner wrote:
>     > Also, I still need to pull instructions to determine if the pulse is legit (i.e. the update is actually a coordinated update). Otherwise, pulse goes nowhere and external service continues to pulse the wrong update.
>     
>     That's not _necessarily_ required, you could contain it in a `Runnable` implementation that is responsible for the async transition.

Not sure I get it. If it's in async part that means the pulse response is already gone and the external service is oblivious what it actually pulsed.


- Maxim


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/#review71063
-----------------------------------------------------------


On Feb. 4, 2015, 5:24 p.m., Maxim Khutornenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30225/
> -----------------------------------------------------------
> 
> (Updated Feb. 4, 2015, 5:24 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.
> 
> 
> Bugs: AURORA-1010
>     https://issues.apache.org/jira/browse/AURORA-1010
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Added pulsing support into the JobUpdateController. The qualified coordinated updates get blocked until a pulse arrives. An update then becomes active and proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a terminal state (whichever comes first).
> 
> Not particularly happy with plumbing through OneWayJobUpdater but the alternative is a state machine change, which is much hairier and will require more changes in the JobUpdaterController. Going with the minimal diff here.
> 
> 
> Diffs
> -----
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 4f603f9e7ed004e53937a45ea8edf7241e15f5cf 
>   src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 7f2ec71d744d5fafac84291cc79feba3398d0e1e 
>   src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 
>   src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java d479d20259f284528b2291e2e486b36d8e47fe5e 
>   src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java 60f535988a20638fb16515d5027bfa65f1afb73c 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java d3b30d48b76d8d7c64cda006a34f7ed3296526f2 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java ec9d1e07abca1bf30262e1c0f741a34741e96f5d 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 76460f95e058181b711fb6869f5a34c1d5bdfe31 
>   src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 
>   src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml f9c9ceddc559b43b4a5c45c745d54ff47484edde 
>   src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java ca7c0c2675477cc727ca006697665f997972dfde 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java 89765ac3d34a827d3748fb96a275d78e9d1b8b72 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 41d422939209d0808235128e4242c11e8ef25969 
>   src/test/java/org/apache/aurora/scheduler/updater/OneWayJobUpdaterTest.java 7d0a7438b4a517e5e0d44f4e99aceb1a6d19f987 
> 
> Diff: https://reviews.apache.org/r/30225/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew -Pq build
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>


Re: Review Request 30225: Modifying update controller to support heartbeats.

Posted by Bill Farner <wf...@apache.org>.

> On Feb. 4, 2015, 11:34 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java, line 273
> > <https://reviews.apache.org/r/30225/diff/4/?file=848246#file848246line273>
> >
> >     I think avoid acquisition of a write lock here is a good goal to aim for.  If, for example, we are already holding the write lock for a large snapshot, we could cause it to appear as though we have not received a pulse for an extended duration.
> >     
> >     Also, given that there is likely to be an automated system on the other end, we should be somewhat defensive to a high call frequency.  This suggests to me that potentially-expensive read operations should be avoided if possible.
> >     
> >     For these reasons, i think it would be wise to decouple recording of pulses from reacting to them.  It's also apparent that there is currently no asynchronous transition to the 'blocked' states, they're triggered either by a tardy pulse or another external action.  I think this would be surprising behavior, and the scheduler should automatically transition to these states without any external input.
> 
> Maxim Khutornenko wrote:
>     "I think avoid acquisition of a write lock here is a good goal to aim for.  If, for example, we are already holding the write lock for a large snapshot, we could cause it to appear as though we have not received a pulse for an extended duration." - makes sense, I can try to split the recording part from the status update via an async action. 
>     
>     "Also, given that there is likely to be an automated system on the other end, we should be somewhat defensive to a high call frequency.  This suggests to me that potentially-expensive read operations should be avoided if possible." - I don't see how this would be possible without breaking the RPC contract as I still have to pull JobUpdateDetails in order to respond to the pulse with proper status message (FINISHED|PAUSED|OK).
>     
>     
>     "It's also apparent that there is currently no asynchronous transition to the 'blocked' states, they're triggered either by a tardy pulse or another external action.  I think this would be surprising behavior, and the scheduler should automatically transition to these states without any external input." - the transition to BLOCKED state happens during the evaluation loop similar to other internally triggered state transitions (FAILED|ERROR|ROLL_FORWARD_COMPLETED and etc.). Transitions to these states do not happen instantenously and are triggered by instance activities (i.e. task state change events). I don't see why BLOCKED state should be any different. The update is not technically alive until the evaluation event arrives. Changing this behavior just for heartbeats is not worth the extra complexity and will be inconsistent with the rest of the feature.

> I don't see how this would be possible without breaking the RPC contract as I still have to pull JobUpdateDetails in order to respond to the pulse with proper status message (FINISHED|PAUSED|OK)

Yeah, this crossed my mind.  Perhaps we should poke at the API to see if we can shake those requirements out?  For example, we could just return OK/UPDATE_NOT_ACTIVE, and the caller should follow up with a query to differentiate between a paused and finished update?

I don't feel really strongly about this, but i think we should feel free to question the requirements at this phase.

> the transition to BLOCKED state happens during the evaluation loop similar to other internally triggered state transitions (FAILED|ERROR|ROLL_FORWARD_COMPLETED and etc.). Transitions to these states do not happen instantenously and are triggered by instance activities (i.e. task state change events). I don't see why BLOCKED state should be any different. The update is not technically alive until the evaluation event arrives. Changing this behavior just for heartbeats is not worth the extra complexity and will be inconsistent with the rest of the feature

Transitions do indeed happen asynchronously when dealing with timeouts (relevant code is in `InstanceActionHandler`).  These fall back to the same loop, but give the updater a time after which the state should be re-evaluated.


- Bill


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/#review71063
-----------------------------------------------------------


On Feb. 4, 2015, 5:24 p.m., Maxim Khutornenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30225/
> -----------------------------------------------------------
> 
> (Updated Feb. 4, 2015, 5:24 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.
> 
> 
> Bugs: AURORA-1010
>     https://issues.apache.org/jira/browse/AURORA-1010
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Added pulsing support into the JobUpdateController. The qualified coordinated updates get blocked until a pulse arrives. An update then becomes active and proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a terminal state (whichever comes first).
> 
> Not particularly happy with plumbing through OneWayJobUpdater but the alternative is a state machine change, which is much hairier and will require more changes in the JobUpdaterController. Going with the minimal diff here.
> 
> 
> Diffs
> -----
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 4f603f9e7ed004e53937a45ea8edf7241e15f5cf 
>   src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 7f2ec71d744d5fafac84291cc79feba3398d0e1e 
>   src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 
>   src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java d479d20259f284528b2291e2e486b36d8e47fe5e 
>   src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java 60f535988a20638fb16515d5027bfa65f1afb73c 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java d3b30d48b76d8d7c64cda006a34f7ed3296526f2 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java ec9d1e07abca1bf30262e1c0f741a34741e96f5d 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 76460f95e058181b711fb6869f5a34c1d5bdfe31 
>   src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 
>   src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml f9c9ceddc559b43b4a5c45c745d54ff47484edde 
>   src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java ca7c0c2675477cc727ca006697665f997972dfde 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java 89765ac3d34a827d3748fb96a275d78e9d1b8b72 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 41d422939209d0808235128e4242c11e8ef25969 
>   src/test/java/org/apache/aurora/scheduler/updater/OneWayJobUpdaterTest.java 7d0a7438b4a517e5e0d44f4e99aceb1a6d19f987 
> 
> Diff: https://reviews.apache.org/r/30225/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew -Pq build
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>


Re: Review Request 30225: Modifying update controller to support heartbeats.

Posted by Bill Farner <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/#review71063
-----------------------------------------------------------


I suggest you skip to the big comment before paying attention to the smaller ones.  If you agree with the concern, i suggest an initial focused diff that implements receiving heartbeats, and asynchronously reacting to the lack of their presence.


api/src/main/thrift/org/apache/aurora/gen/api.thrift
<https://reviews.apache.org/r/30225/#comment116606>

    Maybe s/BLOCKED/AWAITING_PULSE/?
    
    That would at least self-document and avoid additional terminology.



src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java
<https://reviews.apache.org/r/30225/#comment116607>

    s/query/fetch/ to be consistent with the method above.



src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
<https://reviews.apache.org/r/30225/#comment116619>

    Remove Optional here, do Optional.of() at the call site.



src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
<https://reviews.apache.org/r/30225/#comment116620>

    TODO + ticket, this map needs to be exposed via a debug endpoint



src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
<https://reviews.apache.org/r/30225/#comment116626>

    I think avoid acquisition of a write lock here is a good goal to aim for.  If, for example, we are already holding the write lock for a large snapshot, we could cause it to appear as though we have not received a pulse for an extended duration.
    
    Also, given that there is likely to be an automated system on the other end, we should be somewhat defensive to a high call frequency.  This suggests to me that potentially-expensive read operations should be avoided if possible.
    
    For these reasons, i think it would be wise to decouple recording of pulses from reacting to them.  It's also apparent that there is currently no asynchronous transition to the 'blocked' states, they're triggered either by a tardy pulse or another external action.  I think this would be surprising behavior, and the scheduler should automatically transition to these states without any external input.



src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml
<https://reviews.apache.org/r/30225/#comment116643>

    is it possible to reduce the diff a bit by moving this block to its former location, or is it needed


- Bill Farner


On Feb. 4, 2015, 5:24 p.m., Maxim Khutornenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30225/
> -----------------------------------------------------------
> 
> (Updated Feb. 4, 2015, 5:24 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.
> 
> 
> Bugs: AURORA-1010
>     https://issues.apache.org/jira/browse/AURORA-1010
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Added pulsing support into the JobUpdateController. The qualified coordinated updates get blocked until a pulse arrives. An update then becomes active and proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a terminal state (whichever comes first).
> 
> Not particularly happy with plumbing through OneWayJobUpdater but the alternative is a state machine change, which is much hairier and will require more changes in the JobUpdaterController. Going with the minimal diff here.
> 
> 
> Diffs
> -----
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 4f603f9e7ed004e53937a45ea8edf7241e15f5cf 
>   src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 7f2ec71d744d5fafac84291cc79feba3398d0e1e 
>   src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 
>   src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java d479d20259f284528b2291e2e486b36d8e47fe5e 
>   src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java 60f535988a20638fb16515d5027bfa65f1afb73c 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java d3b30d48b76d8d7c64cda006a34f7ed3296526f2 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java ec9d1e07abca1bf30262e1c0f741a34741e96f5d 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 76460f95e058181b711fb6869f5a34c1d5bdfe31 
>   src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 
>   src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml f9c9ceddc559b43b4a5c45c745d54ff47484edde 
>   src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java ca7c0c2675477cc727ca006697665f997972dfde 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java 89765ac3d34a827d3748fb96a275d78e9d1b8b72 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 41d422939209d0808235128e4242c11e8ef25969 
>   src/test/java/org/apache/aurora/scheduler/updater/OneWayJobUpdaterTest.java 7d0a7438b4a517e5e0d44f4e99aceb1a6d19f987 
> 
> Diff: https://reviews.apache.org/r/30225/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew -Pq build
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>


Re: Review Request 30225: Modifying update controller to support heartbeats.

Posted by Aurora ReviewBot <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/#review71143
-----------------------------------------------------------

Ship it!


Master (edcc252) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot retry"

- Aurora ReviewBot


On Feb. 5, 2015, 2:34 a.m., Maxim Khutornenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30225/
> -----------------------------------------------------------
> 
> (Updated Feb. 5, 2015, 2:34 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.
> 
> 
> Bugs: AURORA-1010
>     https://issues.apache.org/jira/browse/AURORA-1010
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Added pulsing support into the JobUpdateController. The qualified coordinated updates get blocked until a pulse arrives. An update then becomes active and proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a terminal state (whichever comes first).
> 
> Not particularly happy with plumbing through OneWayJobUpdater but the alternative is a state machine change, which is much hairier and will require more changes in the JobUpdaterController. Going with the minimal diff here.
> 
> 
> Diffs
> -----
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 4f603f9e7ed004e53937a45ea8edf7241e15f5cf 
>   src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 7f2ec71d744d5fafac84291cc79feba3398d0e1e 
>   src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 
>   src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java d479d20259f284528b2291e2e486b36d8e47fe5e 
>   src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java 60f535988a20638fb16515d5027bfa65f1afb73c 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java d3b30d48b76d8d7c64cda006a34f7ed3296526f2 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java ec9d1e07abca1bf30262e1c0f741a34741e96f5d 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 76460f95e058181b711fb6869f5a34c1d5bdfe31 
>   src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 
>   src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml f9c9ceddc559b43b4a5c45c745d54ff47484edde 
>   src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java ca7c0c2675477cc727ca006697665f997972dfde 
>   src/test/java/org/apache/aurora/scheduler/testing/FakeScheduledExecutor.java 92cfa2b30c1c4daa3ae225fc8609fbeaecdaff7a 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java 89765ac3d34a827d3748fb96a275d78e9d1b8b72 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 41d422939209d0808235128e4242c11e8ef25969 
>   src/test/java/org/apache/aurora/scheduler/updater/OneWayJobUpdaterTest.java 7d0a7438b4a517e5e0d44f4e99aceb1a6d19f987 
> 
> Diff: https://reviews.apache.org/r/30225/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew -Pq build
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>


Re: Review Request 30225: Modifying update controller to support heartbeats.

Posted by Maxim Khutornenko <ma...@apache.org>.

> On Feb. 5, 2015, 6:07 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java, line 308
> > <https://reviews.apache.org/r/30225/diff/5/?file=849498#file849498line308>
> >
> >     Deferrment is not quite what i had in mind.  I was thinking something more along the lines of how `TaskTimeout` works, where receiving a pulse causes you to schedule an action in the future that will change the update state (provided no more pulses are received).  If we go this route, i think a separate class should be used to manage heartbeats, and this presents a logical place to break this diff apart.
> >     
> >     As the code stands, we still have the problem where a scheduler with idle inputs will not transition the update state.

>As the code stands, we still have the problem where a scheduler with idle inputs will not transition the update state.
I still don't see a reason to overcomplicate the logic here. Why move the update into a new state when there is no internal activity and why should it be any different that instance-driven transition events?


- Maxim


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/#review71245
-----------------------------------------------------------


On Feb. 5, 2015, 2:34 a.m., Maxim Khutornenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30225/
> -----------------------------------------------------------
> 
> (Updated Feb. 5, 2015, 2:34 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.
> 
> 
> Bugs: AURORA-1010
>     https://issues.apache.org/jira/browse/AURORA-1010
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Added pulsing support into the JobUpdateController. The qualified coordinated updates get blocked until a pulse arrives. An update then becomes active and proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a terminal state (whichever comes first).
> 
> Not particularly happy with plumbing through OneWayJobUpdater but the alternative is a state machine change, which is much hairier and will require more changes in the JobUpdaterController. Going with the minimal diff here.
> 
> 
> Diffs
> -----
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 4f603f9e7ed004e53937a45ea8edf7241e15f5cf 
>   src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 7f2ec71d744d5fafac84291cc79feba3398d0e1e 
>   src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 
>   src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java d479d20259f284528b2291e2e486b36d8e47fe5e 
>   src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java 60f535988a20638fb16515d5027bfa65f1afb73c 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java d3b30d48b76d8d7c64cda006a34f7ed3296526f2 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java ec9d1e07abca1bf30262e1c0f741a34741e96f5d 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 76460f95e058181b711fb6869f5a34c1d5bdfe31 
>   src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 
>   src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml f9c9ceddc559b43b4a5c45c745d54ff47484edde 
>   src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java ca7c0c2675477cc727ca006697665f997972dfde 
>   src/test/java/org/apache/aurora/scheduler/testing/FakeScheduledExecutor.java 92cfa2b30c1c4daa3ae225fc8609fbeaecdaff7a 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java 89765ac3d34a827d3748fb96a275d78e9d1b8b72 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 41d422939209d0808235128e4242c11e8ef25969 
>   src/test/java/org/apache/aurora/scheduler/updater/OneWayJobUpdaterTest.java 7d0a7438b4a517e5e0d44f4e99aceb1a6d19f987 
> 
> Diff: https://reviews.apache.org/r/30225/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew -Pq build
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>


Re: Review Request 30225: Modifying update controller to support heartbeats.

Posted by Bill Farner <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/#review71245
-----------------------------------------------------------



src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java
<https://reviews.apache.org/r/30225/#comment116915>

    s/query/fetch/



src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
<https://reviews.apache.org/r/30225/#comment116917>

    Deferrment is not quite what i had in mind.  I was thinking something more along the lines of how `TaskTimeout` works, where receiving a pulse causes you to schedule an action in the future that will change the update state (provided no more pulses are received).  If we go this route, i think a separate class should be used to manage heartbeats, and this presents a logical place to break this diff apart.
    
    As the code stands, we still have the problem where a scheduler with idle inputs will not transition the update state.


- Bill Farner


On Feb. 5, 2015, 2:34 a.m., Maxim Khutornenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30225/
> -----------------------------------------------------------
> 
> (Updated Feb. 5, 2015, 2:34 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.
> 
> 
> Bugs: AURORA-1010
>     https://issues.apache.org/jira/browse/AURORA-1010
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Added pulsing support into the JobUpdateController. The qualified coordinated updates get blocked until a pulse arrives. An update then becomes active and proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a terminal state (whichever comes first).
> 
> Not particularly happy with plumbing through OneWayJobUpdater but the alternative is a state machine change, which is much hairier and will require more changes in the JobUpdaterController. Going with the minimal diff here.
> 
> 
> Diffs
> -----
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 4f603f9e7ed004e53937a45ea8edf7241e15f5cf 
>   src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 7f2ec71d744d5fafac84291cc79feba3398d0e1e 
>   src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 
>   src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java d479d20259f284528b2291e2e486b36d8e47fe5e 
>   src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java 60f535988a20638fb16515d5027bfa65f1afb73c 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java d3b30d48b76d8d7c64cda006a34f7ed3296526f2 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java ec9d1e07abca1bf30262e1c0f741a34741e96f5d 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 76460f95e058181b711fb6869f5a34c1d5bdfe31 
>   src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 
>   src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml f9c9ceddc559b43b4a5c45c745d54ff47484edde 
>   src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java ca7c0c2675477cc727ca006697665f997972dfde 
>   src/test/java/org/apache/aurora/scheduler/testing/FakeScheduledExecutor.java 92cfa2b30c1c4daa3ae225fc8609fbeaecdaff7a 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java 89765ac3d34a827d3748fb96a275d78e9d1b8b72 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 41d422939209d0808235128e4242c11e8ef25969 
>   src/test/java/org/apache/aurora/scheduler/updater/OneWayJobUpdaterTest.java 7d0a7438b4a517e5e0d44f4e99aceb1a6d19f987 
> 
> Diff: https://reviews.apache.org/r/30225/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew -Pq build
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>


Re: Review Request 30225: Modifying update controller to support heartbeats.

Posted by Aurora ReviewBot <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/#review71545
-----------------------------------------------------------

Ship it!


Master (11a65d2) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot retry"

- Aurora ReviewBot


On Feb. 7, 2015, 2:43 a.m., Maxim Khutornenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30225/
> -----------------------------------------------------------
> 
> (Updated Feb. 7, 2015, 2:43 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.
> 
> 
> Bugs: AURORA-1010
>     https://issues.apache.org/jira/browse/AURORA-1010
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Added pulsing support into the JobUpdateController. The qualified coordinated updates get blocked until a pulse arrives. An update then becomes active and proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a terminal state (whichever comes first).
> 
> UPDATE:
> - Added explicit states to capture "blocked" updates
> - Refactored pulseUpdate() to not rely on DB state
> - Dropped JobUpdatePulseState.PAUSED as it does not seem necessary.
> 
> 
> Diffs
> -----
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 4f603f9e7ed004e53937a45ea8edf7241e15f5cf 
>   src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 7f2ec71d744d5fafac84291cc79feba3398d0e1e 
>   src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 
>   src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java d479d20259f284528b2291e2e486b36d8e47fe5e 
>   src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java 60f535988a20638fb16515d5027bfa65f1afb73c 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java d3b30d48b76d8d7c64cda006a34f7ed3296526f2 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java ec9d1e07abca1bf30262e1c0f741a34741e96f5d 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 76460f95e058181b711fb6869f5a34c1d5bdfe31 
>   src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml f9c9ceddc559b43b4a5c45c745d54ff47484edde 
>   src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java ca7c0c2675477cc727ca006697665f997972dfde 
>   src/test/java/org/apache/aurora/scheduler/testing/FakeScheduledExecutor.java 92cfa2b30c1c4daa3ae225fc8609fbeaecdaff7a 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java 89765ac3d34a827d3748fb96a275d78e9d1b8b72 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 41d422939209d0808235128e4242c11e8ef25969 
> 
> Diff: https://reviews.apache.org/r/30225/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew -Pq build
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>


Re: Review Request 30225: Modifying update controller to support heartbeats.

Posted by David McLaughlin <da...@dmclaughlin.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/#review71729
-----------------------------------------------------------

Ship it!


Pulse logic LGTM.

- David McLaughlin


On Feb. 7, 2015, 2:43 a.m., Maxim Khutornenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30225/
> -----------------------------------------------------------
> 
> (Updated Feb. 7, 2015, 2:43 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.
> 
> 
> Bugs: AURORA-1010
>     https://issues.apache.org/jira/browse/AURORA-1010
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Added pulsing support into the JobUpdateController. The qualified coordinated updates get blocked until a pulse arrives. An update then becomes active and proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a terminal state (whichever comes first).
> 
> UPDATE:
> - Added explicit states to capture "blocked" updates
> - Refactored pulseUpdate() to not rely on DB state
> - Dropped JobUpdatePulseState.PAUSED as it does not seem necessary.
> 
> 
> Diffs
> -----
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 4f603f9e7ed004e53937a45ea8edf7241e15f5cf 
>   src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 7f2ec71d744d5fafac84291cc79feba3398d0e1e 
>   src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 
>   src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java d479d20259f284528b2291e2e486b36d8e47fe5e 
>   src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java 60f535988a20638fb16515d5027bfa65f1afb73c 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java d3b30d48b76d8d7c64cda006a34f7ed3296526f2 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java ec9d1e07abca1bf30262e1c0f741a34741e96f5d 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 76460f95e058181b711fb6869f5a34c1d5bdfe31 
>   src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml f9c9ceddc559b43b4a5c45c745d54ff47484edde 
>   src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java ca7c0c2675477cc727ca006697665f997972dfde 
>   src/test/java/org/apache/aurora/scheduler/testing/FakeScheduledExecutor.java 92cfa2b30c1c4daa3ae225fc8609fbeaecdaff7a 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java 89765ac3d34a827d3748fb96a275d78e9d1b8b72 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 41d422939209d0808235128e4242c11e8ef25969 
> 
> Diff: https://reviews.apache.org/r/30225/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew -Pq build
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>


Re: Review Request 30225: Modifying update controller to support heartbeats.

Posted by Aurora ReviewBot <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/#review72022
-----------------------------------------------------------

Ship it!


Master (7b531e9) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot retry"

- Aurora ReviewBot


On Feb. 11, 2015, 7:19 p.m., Maxim Khutornenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30225/
> -----------------------------------------------------------
> 
> (Updated Feb. 11, 2015, 7:19 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.
> 
> 
> Bugs: AURORA-1010
>     https://issues.apache.org/jira/browse/AURORA-1010
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Added pulsing support into the JobUpdateController. The qualified coordinated updates get blocked until a pulse arrives. An update then becomes active and proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a terminal state (whichever comes first).
> 
> UPDATE:
> - Added explicit states to capture "blocked" updates
> - Refactored pulseUpdate() to not rely on DB state
> - Dropped JobUpdatePulseState.PAUSED as it does not seem necessary.
> 
> 
> Diffs
> -----
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 4f603f9e7ed004e53937a45ea8edf7241e15f5cf 
>   src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 7f2ec71d744d5fafac84291cc79feba3398d0e1e 
>   src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 
>   src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java d479d20259f284528b2291e2e486b36d8e47fe5e 
>   src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java 60f535988a20638fb16515d5027bfa65f1afb73c 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java d3b30d48b76d8d7c64cda006a34f7ed3296526f2 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java ec9d1e07abca1bf30262e1c0f741a34741e96f5d 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 76460f95e058181b711fb6869f5a34c1d5bdfe31 
>   src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml f9c9ceddc559b43b4a5c45c745d54ff47484edde 
>   src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java ca7c0c2675477cc727ca006697665f997972dfde 
>   src/test/java/org/apache/aurora/scheduler/testing/FakeScheduledExecutor.java 92cfa2b30c1c4daa3ae225fc8609fbeaecdaff7a 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java 89765ac3d34a827d3748fb96a275d78e9d1b8b72 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 41d422939209d0808235128e4242c11e8ef25969 
> 
> Diff: https://reviews.apache.org/r/30225/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew -Pq build
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>


Re: Review Request 30225: Modifying update controller to support heartbeats.

Posted by Joshua Cohen <jc...@twopensource.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/#review72028
-----------------------------------------------------------

Ship it!


Ship It!

- Joshua Cohen


On Feb. 11, 2015, 10:01 p.m., Maxim Khutornenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30225/
> -----------------------------------------------------------
> 
> (Updated Feb. 11, 2015, 10:01 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.
> 
> 
> Bugs: AURORA-1010
>     https://issues.apache.org/jira/browse/AURORA-1010
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Added pulsing support into the JobUpdateController. The qualified coordinated updates get blocked until a pulse arrives. An update then becomes active and proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a terminal state (whichever comes first).
> 
> UPDATE:
> - Added explicit states to capture "blocked" updates
> - Refactored pulseUpdate() to not rely on DB state
> - Dropped JobUpdatePulseState.PAUSED as it does not seem necessary.
> 
> 
> Diffs
> -----
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 4f603f9e7ed004e53937a45ea8edf7241e15f5cf 
>   src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 7f2ec71d744d5fafac84291cc79feba3398d0e1e 
>   src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 
>   src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java d479d20259f284528b2291e2e486b36d8e47fe5e 
>   src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java 60f535988a20638fb16515d5027bfa65f1afb73c 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java d3b30d48b76d8d7c64cda006a34f7ed3296526f2 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java ec9d1e07abca1bf30262e1c0f741a34741e96f5d 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 76460f95e058181b711fb6869f5a34c1d5bdfe31 
>   src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml f9c9ceddc559b43b4a5c45c745d54ff47484edde 
>   src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java ca7c0c2675477cc727ca006697665f997972dfde 
>   src/test/java/org/apache/aurora/scheduler/testing/FakeScheduledExecutor.java 92cfa2b30c1c4daa3ae225fc8609fbeaecdaff7a 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java 89765ac3d34a827d3748fb96a275d78e9d1b8b72 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 41d422939209d0808235128e4242c11e8ef25969 
> 
> Diff: https://reviews.apache.org/r/30225/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew -Pq build
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>


Re: Review Request 30225: Modifying update controller to support heartbeats.

Posted by Aurora ReviewBot <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/#review72041
-----------------------------------------------------------

Ship it!


Master (7b531e9) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot retry"

- Aurora ReviewBot


On Feb. 11, 2015, 10:01 p.m., Maxim Khutornenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30225/
> -----------------------------------------------------------
> 
> (Updated Feb. 11, 2015, 10:01 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.
> 
> 
> Bugs: AURORA-1010
>     https://issues.apache.org/jira/browse/AURORA-1010
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Added pulsing support into the JobUpdateController. The qualified coordinated updates get blocked until a pulse arrives. An update then becomes active and proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a terminal state (whichever comes first).
> 
> UPDATE:
> - Added explicit states to capture "blocked" updates
> - Refactored pulseUpdate() to not rely on DB state
> - Dropped JobUpdatePulseState.PAUSED as it does not seem necessary.
> 
> 
> Diffs
> -----
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 4f603f9e7ed004e53937a45ea8edf7241e15f5cf 
>   src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 7f2ec71d744d5fafac84291cc79feba3398d0e1e 
>   src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 
>   src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java d479d20259f284528b2291e2e486b36d8e47fe5e 
>   src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java 60f535988a20638fb16515d5027bfa65f1afb73c 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java d3b30d48b76d8d7c64cda006a34f7ed3296526f2 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java ec9d1e07abca1bf30262e1c0f741a34741e96f5d 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 76460f95e058181b711fb6869f5a34c1d5bdfe31 
>   src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml f9c9ceddc559b43b4a5c45c745d54ff47484edde 
>   src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java ca7c0c2675477cc727ca006697665f997972dfde 
>   src/test/java/org/apache/aurora/scheduler/testing/FakeScheduledExecutor.java 92cfa2b30c1c4daa3ae225fc8609fbeaecdaff7a 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java 89765ac3d34a827d3748fb96a275d78e9d1b8b72 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 41d422939209d0808235128e4242c11e8ef25969 
> 
> Diff: https://reviews.apache.org/r/30225/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew -Pq build
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>


Re: Review Request 30225: Modifying update controller to support heartbeats.

Posted by Maxim Khutornenko <ma...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/
-----------------------------------------------------------

(Updated Feb. 11, 2015, 10:01 p.m.)


Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.


Changes
-------

Josh's comments.


Bugs: AURORA-1010
    https://issues.apache.org/jira/browse/AURORA-1010


Repository: aurora


Description
-------

Added pulsing support into the JobUpdateController. The qualified coordinated updates get blocked until a pulse arrives. An update then becomes active and proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a terminal state (whichever comes first).

UPDATE:
- Added explicit states to capture "blocked" updates
- Refactored pulseUpdate() to not rely on DB state
- Dropped JobUpdatePulseState.PAUSED as it does not seem necessary.


Diffs (updated)
-----

  api/src/main/thrift/org/apache/aurora/gen/api.thrift 4f603f9e7ed004e53937a45ea8edf7241e15f5cf 
  src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 7f2ec71d744d5fafac84291cc79feba3398d0e1e 
  src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 
  src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java d479d20259f284528b2291e2e486b36d8e47fe5e 
  src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java 60f535988a20638fb16515d5027bfa65f1afb73c 
  src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java d3b30d48b76d8d7c64cda006a34f7ed3296526f2 
  src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java ec9d1e07abca1bf30262e1c0f741a34741e96f5d 
  src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 76460f95e058181b711fb6869f5a34c1d5bdfe31 
  src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml f9c9ceddc559b43b4a5c45c745d54ff47484edde 
  src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java ca7c0c2675477cc727ca006697665f997972dfde 
  src/test/java/org/apache/aurora/scheduler/testing/FakeScheduledExecutor.java 92cfa2b30c1c4daa3ae225fc8609fbeaecdaff7a 
  src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java 89765ac3d34a827d3748fb96a275d78e9d1b8b72 
  src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 41d422939209d0808235128e4242c11e8ef25969 

Diff: https://reviews.apache.org/r/30225/diff/


Testing
-------

./gradlew -Pq build


Thanks,

Maxim Khutornenko


Re: Review Request 30225: Modifying update controller to support heartbeats.

Posted by Bill Farner <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/#review72016
-----------------------------------------------------------

Ship it!


Ship It!

- Bill Farner


On Feb. 11, 2015, 7:19 p.m., Maxim Khutornenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30225/
> -----------------------------------------------------------
> 
> (Updated Feb. 11, 2015, 7:19 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.
> 
> 
> Bugs: AURORA-1010
>     https://issues.apache.org/jira/browse/AURORA-1010
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Added pulsing support into the JobUpdateController. The qualified coordinated updates get blocked until a pulse arrives. An update then becomes active and proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a terminal state (whichever comes first).
> 
> UPDATE:
> - Added explicit states to capture "blocked" updates
> - Refactored pulseUpdate() to not rely on DB state
> - Dropped JobUpdatePulseState.PAUSED as it does not seem necessary.
> 
> 
> Diffs
> -----
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 4f603f9e7ed004e53937a45ea8edf7241e15f5cf 
>   src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 7f2ec71d744d5fafac84291cc79feba3398d0e1e 
>   src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 
>   src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java d479d20259f284528b2291e2e486b36d8e47fe5e 
>   src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java 60f535988a20638fb16515d5027bfa65f1afb73c 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java d3b30d48b76d8d7c64cda006a34f7ed3296526f2 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java ec9d1e07abca1bf30262e1c0f741a34741e96f5d 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 76460f95e058181b711fb6869f5a34c1d5bdfe31 
>   src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml f9c9ceddc559b43b4a5c45c745d54ff47484edde 
>   src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java ca7c0c2675477cc727ca006697665f997972dfde 
>   src/test/java/org/apache/aurora/scheduler/testing/FakeScheduledExecutor.java 92cfa2b30c1c4daa3ae225fc8609fbeaecdaff7a 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java 89765ac3d34a827d3748fb96a275d78e9d1b8b72 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 41d422939209d0808235128e4242c11e8ef25969 
> 
> Diff: https://reviews.apache.org/r/30225/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew -Pq build
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>


Re: Review Request 30225: Modifying update controller to support heartbeats.

Posted by Maxim Khutornenko <ma...@apache.org>.

> On Feb. 11, 2015, 8:46 p.m., Joshua Cohen wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java, line 276
> > <https://reviews.apache.org/r/30225/diff/7/?file=860914#file860914line276>
> >
> >     can drop the else?

Sure.


- Maxim


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/#review72014
-----------------------------------------------------------


On Feb. 11, 2015, 7:19 p.m., Maxim Khutornenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30225/
> -----------------------------------------------------------
> 
> (Updated Feb. 11, 2015, 7:19 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.
> 
> 
> Bugs: AURORA-1010
>     https://issues.apache.org/jira/browse/AURORA-1010
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Added pulsing support into the JobUpdateController. The qualified coordinated updates get blocked until a pulse arrives. An update then becomes active and proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a terminal state (whichever comes first).
> 
> UPDATE:
> - Added explicit states to capture "blocked" updates
> - Refactored pulseUpdate() to not rely on DB state
> - Dropped JobUpdatePulseState.PAUSED as it does not seem necessary.
> 
> 
> Diffs
> -----
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 4f603f9e7ed004e53937a45ea8edf7241e15f5cf 
>   src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 7f2ec71d744d5fafac84291cc79feba3398d0e1e 
>   src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 
>   src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java d479d20259f284528b2291e2e486b36d8e47fe5e 
>   src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java 60f535988a20638fb16515d5027bfa65f1afb73c 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java d3b30d48b76d8d7c64cda006a34f7ed3296526f2 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java ec9d1e07abca1bf30262e1c0f741a34741e96f5d 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 76460f95e058181b711fb6869f5a34c1d5bdfe31 
>   src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml f9c9ceddc559b43b4a5c45c745d54ff47484edde 
>   src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java ca7c0c2675477cc727ca006697665f997972dfde 
>   src/test/java/org/apache/aurora/scheduler/testing/FakeScheduledExecutor.java 92cfa2b30c1c4daa3ae225fc8609fbeaecdaff7a 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java 89765ac3d34a827d3748fb96a275d78e9d1b8b72 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 41d422939209d0808235128e4242c11e8ef25969 
> 
> Diff: https://reviews.apache.org/r/30225/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew -Pq build
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>


Re: Review Request 30225: Modifying update controller to support heartbeats.

Posted by Joshua Cohen <jc...@twopensource.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/#review72014
-----------------------------------------------------------

Ship it!



src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
<https://reviews.apache.org/r/30225/#comment117942>

    can drop the else?


- Joshua Cohen


On Feb. 11, 2015, 7:19 p.m., Maxim Khutornenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30225/
> -----------------------------------------------------------
> 
> (Updated Feb. 11, 2015, 7:19 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.
> 
> 
> Bugs: AURORA-1010
>     https://issues.apache.org/jira/browse/AURORA-1010
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Added pulsing support into the JobUpdateController. The qualified coordinated updates get blocked until a pulse arrives. An update then becomes active and proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a terminal state (whichever comes first).
> 
> UPDATE:
> - Added explicit states to capture "blocked" updates
> - Refactored pulseUpdate() to not rely on DB state
> - Dropped JobUpdatePulseState.PAUSED as it does not seem necessary.
> 
> 
> Diffs
> -----
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 4f603f9e7ed004e53937a45ea8edf7241e15f5cf 
>   src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 7f2ec71d744d5fafac84291cc79feba3398d0e1e 
>   src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 
>   src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java d479d20259f284528b2291e2e486b36d8e47fe5e 
>   src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java 60f535988a20638fb16515d5027bfa65f1afb73c 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java d3b30d48b76d8d7c64cda006a34f7ed3296526f2 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java ec9d1e07abca1bf30262e1c0f741a34741e96f5d 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 76460f95e058181b711fb6869f5a34c1d5bdfe31 
>   src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml f9c9ceddc559b43b4a5c45c745d54ff47484edde 
>   src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java ca7c0c2675477cc727ca006697665f997972dfde 
>   src/test/java/org/apache/aurora/scheduler/testing/FakeScheduledExecutor.java 92cfa2b30c1c4daa3ae225fc8609fbeaecdaff7a 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java 89765ac3d34a827d3748fb96a275d78e9d1b8b72 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 41d422939209d0808235128e4242c11e8ef25969 
> 
> Diff: https://reviews.apache.org/r/30225/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew -Pq build
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>


Re: Review Request 30225: Modifying update controller to support heartbeats.

Posted by Maxim Khutornenko <ma...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/
-----------------------------------------------------------

(Updated Feb. 11, 2015, 7:19 p.m.)


Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.


Changes
-------

Bill's comments.


Bugs: AURORA-1010
    https://issues.apache.org/jira/browse/AURORA-1010


Repository: aurora


Description
-------

Added pulsing support into the JobUpdateController. The qualified coordinated updates get blocked until a pulse arrives. An update then becomes active and proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a terminal state (whichever comes first).

UPDATE:
- Added explicit states to capture "blocked" updates
- Refactored pulseUpdate() to not rely on DB state
- Dropped JobUpdatePulseState.PAUSED as it does not seem necessary.


Diffs (updated)
-----

  api/src/main/thrift/org/apache/aurora/gen/api.thrift 4f603f9e7ed004e53937a45ea8edf7241e15f5cf 
  src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 7f2ec71d744d5fafac84291cc79feba3398d0e1e 
  src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 
  src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java d479d20259f284528b2291e2e486b36d8e47fe5e 
  src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java 60f535988a20638fb16515d5027bfa65f1afb73c 
  src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java d3b30d48b76d8d7c64cda006a34f7ed3296526f2 
  src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java ec9d1e07abca1bf30262e1c0f741a34741e96f5d 
  src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 76460f95e058181b711fb6869f5a34c1d5bdfe31 
  src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml f9c9ceddc559b43b4a5c45c745d54ff47484edde 
  src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java ca7c0c2675477cc727ca006697665f997972dfde 
  src/test/java/org/apache/aurora/scheduler/testing/FakeScheduledExecutor.java 92cfa2b30c1c4daa3ae225fc8609fbeaecdaff7a 
  src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java 89765ac3d34a827d3748fb96a275d78e9d1b8b72 
  src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 41d422939209d0808235128e4242c11e8ef25969 

Diff: https://reviews.apache.org/r/30225/diff/


Testing
-------

./gradlew -Pq build


Thanks,

Maxim Khutornenko


Re: Review Request 30225: Modifying update controller to support heartbeats.

Posted by Maxim Khutornenko <ma...@apache.org>.

> On Feb. 11, 2015, 1:03 a.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java, lines 282-292
> > <https://reviews.apache.org/r/30225/diff/6/?file=853467#file853467line282>
> >
> >     There's a race on access to `pulseStates`.  Terminating an update between :282 and :287 would cause a stale entry in the map (`storage.read` doesn't offer a lock, but we shouldn't count on that anyway).  Consider `synchronized (pulseStates)`.
> >     
> >     IMHO, though, this points out that this class is difficult to reason about now that multiple disparate locks are in play.  Did you consider extracting an inner class to use?  This would be different from the fully-isolated class we discussed before, where you can share implementation details and accept high coupling, but at least isolate management of `pulseStates` and manage it with higher-level operations from the rest of the class.  Seems like you've already done this to a degree by keeping methods managing this map near each other.
> >     
> >     Your call on this.

The read lock is no more but old habits die hard. You're right, it's no longer required after removing all DB access from this method. Dropped the storage code and consolidated all map access in a private class.


> On Feb. 11, 2015, 1:03 a.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java, line 277
> > <https://reviews.apache.org/r/30225/diff/6/?file=853467#file853467line277>
> >
> >     `storeProvider` is not used, do you need the `storage.read` at all?

Dropped.


> On Feb. 11, 2015, 1:03 a.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java, line 166
> > <https://reviews.apache.org/r/30225/diff/6/?file=853464#file853464line166>
> >
> >     s/query/fetch/

That's already taken. Renamed to "job_update_store_fetch_details_list".


> On Feb. 11, 2015, 1:03 a.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java, line 123
> > <https://reviews.apache.org/r/30225/diff/6/?file=853467#file853467line123>
> >
> >     Please document this map more thoroughly.  How are entries added here, how are they removed, etc.
> >     
> >     For example, one useful detail is that we retain an entry for an update that is paused.

Done.


> On Feb. 11, 2015, 1:03 a.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java, line 124
> > <https://reviews.apache.org/r/30225/diff/6/?file=853467#file853467line124>
> >
> >     Please include in test coverage some validation of the contents of this map so we can gain confidence that we do not have a memory leak.

It was already tested by the following assertion: `assertEquals(JobUpdatePulseStatus.FINISHED, updater.pulse(UPDATE_ID));`. This would never return FINISHED if there was an entry in the map.

Added to all pulse tests to make sure all scenarios are validated.


> On Feb. 11, 2015, 1:03 a.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java, line 305
> > <https://reviews.apache.org/r/30225/diff/6/?file=853467#file853467line305>
> >
> >     Coverage report indicates this line is not covered.  Can you try to cover it?

Done.


> On Feb. 11, 2015, 1:03 a.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java, line 488
> > <https://reviews.apache.org/r/30225/diff/6/?file=853467#file853467line488>
> >
> >     Shouldn't this be in an `else`?  Looks like we've just wiped a pulse record for terminal updates, but we add another here.

It's updated conditionally if PulseState exists. Added else to make it more clear.


> On Feb. 11, 2015, 1:03 a.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java, line 545
> > <https://reviews.apache.org/r/30225/diff/6/?file=853467#file853467line545>
> >
> >     This only has one caller, consider inlining.

Yeah, there is only one caller now after refactoring. Merged.


> On Feb. 11, 2015, 1:03 a.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java, line 547
> > <https://reviews.apache.org/r/30225/diff/6/?file=853467#file853467line547>
> >
> >     Coverage report indicates an uncovered branch here.  Can you try to cover it?

This one is tricky as it's yellow due to `pulse == null`. Given the trivial nature of this branch and complexity of covering it, I'd like to punt it.


> On Feb. 11, 2015, 1:03 a.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java, line 721
> > <https://reviews.apache.org/r/30225/diff/6/?file=853467#file853467line721>
> >
> >     At risk of being overly-verbose, how about `initializePulseMonitor`?  A quick skim of `initializePulse` makes it sound like it might be simulating an inital pulse.

How about `initializePulseState`? This seems to fit well with the new refactored implementation.


> On Feb. 11, 2015, 1:03 a.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java, line 776
> > <https://reviews.apache.org/r/30225/diff/6/?file=853467#file853467line776>
> >
> >     Your call, but you could massage out the `clock` if you store `expirationMs` and have `getExpirationMs()` instead of storing `lastPulseMs`.

I'd rather keep the last pulse instead as it will be better for debugging endpoint coming in AURORA-1103.


> On Feb. 11, 2015, 1:03 a.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java, line 165
> > <https://reviews.apache.org/r/30225/diff/6/?file=853468#file853468line165>
> >
> >     This is never used as a `Function`, strongly consider converting to a method.  Ditto elsewhere if it applies.

Converted to a method.


> On Feb. 11, 2015, 1:03 a.m., Bill Farner wrote:
> > src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java, lines 700-726
> > <https://reviews.apache.org/r/30225/diff/6/?file=853470#file853470line700>
> >
> >     I think you've done the right thing, but note that this unit test is inconsistent with your reply on review https://reviews.apache.org/r/30804/

I still stand by that comment as it's applicable to thrift tests that don't require extensive setup. It's quite the opposite here where setup is complicated and every test takes a perf toll to run.


- Maxim


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/#review71879
-----------------------------------------------------------


On Feb. 7, 2015, 2:43 a.m., Maxim Khutornenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30225/
> -----------------------------------------------------------
> 
> (Updated Feb. 7, 2015, 2:43 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.
> 
> 
> Bugs: AURORA-1010
>     https://issues.apache.org/jira/browse/AURORA-1010
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Added pulsing support into the JobUpdateController. The qualified coordinated updates get blocked until a pulse arrives. An update then becomes active and proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a terminal state (whichever comes first).
> 
> UPDATE:
> - Added explicit states to capture "blocked" updates
> - Refactored pulseUpdate() to not rely on DB state
> - Dropped JobUpdatePulseState.PAUSED as it does not seem necessary.
> 
> 
> Diffs
> -----
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 4f603f9e7ed004e53937a45ea8edf7241e15f5cf 
>   src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 7f2ec71d744d5fafac84291cc79feba3398d0e1e 
>   src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 
>   src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java d479d20259f284528b2291e2e486b36d8e47fe5e 
>   src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java 60f535988a20638fb16515d5027bfa65f1afb73c 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java d3b30d48b76d8d7c64cda006a34f7ed3296526f2 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java ec9d1e07abca1bf30262e1c0f741a34741e96f5d 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 76460f95e058181b711fb6869f5a34c1d5bdfe31 
>   src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml f9c9ceddc559b43b4a5c45c745d54ff47484edde 
>   src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java ca7c0c2675477cc727ca006697665f997972dfde 
>   src/test/java/org/apache/aurora/scheduler/testing/FakeScheduledExecutor.java 92cfa2b30c1c4daa3ae225fc8609fbeaecdaff7a 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java 89765ac3d34a827d3748fb96a275d78e9d1b8b72 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 41d422939209d0808235128e4242c11e8ef25969 
> 
> Diff: https://reviews.apache.org/r/30225/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew -Pq build
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>


Re: Review Request 30225: Modifying update controller to support heartbeats.

Posted by Bill Farner <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/#review71879
-----------------------------------------------------------


I like the shape of this, thanks for working through a few different paths.


src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java
<https://reviews.apache.org/r/30225/#comment117774>

    s/query/fetch/



src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
<https://reviews.apache.org/r/30225/#comment117776>

    Please document this map more thoroughly.  How are entries added here, how are they removed, etc.
    
    For example, one useful detail is that we retain an entry for an update that is paused.



src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
<https://reviews.apache.org/r/30225/#comment117801>

    Please include in test coverage some validation of the contents of this map so we can gain confidence that we do not have a memory leak.



src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
<https://reviews.apache.org/r/30225/#comment117779>

    `storeProvider` is not used, do you need the `storage.read` at all?



src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
<https://reviews.apache.org/r/30225/#comment117778>

    There's a race on access to `pulseStates`.  Terminating an update between :282 and :287 would cause a stale entry in the map (`storage.read` doesn't offer a lock, but we shouldn't count on that anyway).  Consider `synchronized (pulseStates)`.
    
    IMHO, though, this points out that this class is difficult to reason about now that multiple disparate locks are in play.  Did you consider extracting an inner class to use?  This would be different from the fully-isolated class we discussed before, where you can share implementation details and accept high coupling, but at least isolate management of `pulseStates` and manage it with higher-level operations from the rest of the class.  Seems like you've already done this to a degree by keeping methods managing this map near each other.
    
    Your call on this.



src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
<https://reviews.apache.org/r/30225/#comment117821>

    Coverage report indicates this line is not covered.  Can you try to cover it?



src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
<https://reviews.apache.org/r/30225/#comment117780>

    Shouldn't this be in an `else`?  Looks like we've just wiped a pulse record for terminal updates, but we add another here.



src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
<https://reviews.apache.org/r/30225/#comment117786>

    This only has one caller, consider inlining.



src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
<https://reviews.apache.org/r/30225/#comment117819>

    Coverage report indicates an uncovered branch here.  Can you try to cover it?



src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
<https://reviews.apache.org/r/30225/#comment117775>

    At risk of being overly-verbose, how about `initializePulseMonitor`?  A quick skim of `initializePulse` makes it sound like it might be simulating an inital pulse.



src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
<https://reviews.apache.org/r/30225/#comment117781>

    Your call, but you could massage out the `clock` if you store `expirationMs` and have `getExpirationMs()` instead of storing `lastPulseMs`.



src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java
<https://reviews.apache.org/r/30225/#comment117791>

    This is never used as a `Function`, strongly consider converting to a method.  Ditto elsewhere if it applies.



src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java
<https://reviews.apache.org/r/30225/#comment117803>

    I think you've done the right thing, but note that this unit test is inconsistent with your reply on review https://reviews.apache.org/r/30804/


- Bill Farner


On Feb. 7, 2015, 2:43 a.m., Maxim Khutornenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30225/
> -----------------------------------------------------------
> 
> (Updated Feb. 7, 2015, 2:43 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.
> 
> 
> Bugs: AURORA-1010
>     https://issues.apache.org/jira/browse/AURORA-1010
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Added pulsing support into the JobUpdateController. The qualified coordinated updates get blocked until a pulse arrives. An update then becomes active and proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a terminal state (whichever comes first).
> 
> UPDATE:
> - Added explicit states to capture "blocked" updates
> - Refactored pulseUpdate() to not rely on DB state
> - Dropped JobUpdatePulseState.PAUSED as it does not seem necessary.
> 
> 
> Diffs
> -----
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 4f603f9e7ed004e53937a45ea8edf7241e15f5cf 
>   src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 7f2ec71d744d5fafac84291cc79feba3398d0e1e 
>   src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 
>   src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java d479d20259f284528b2291e2e486b36d8e47fe5e 
>   src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java 60f535988a20638fb16515d5027bfa65f1afb73c 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java d3b30d48b76d8d7c64cda006a34f7ed3296526f2 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java ec9d1e07abca1bf30262e1c0f741a34741e96f5d 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 76460f95e058181b711fb6869f5a34c1d5bdfe31 
>   src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml f9c9ceddc559b43b4a5c45c745d54ff47484edde 
>   src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java ca7c0c2675477cc727ca006697665f997972dfde 
>   src/test/java/org/apache/aurora/scheduler/testing/FakeScheduledExecutor.java 92cfa2b30c1c4daa3ae225fc8609fbeaecdaff7a 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java 89765ac3d34a827d3748fb96a275d78e9d1b8b72 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 41d422939209d0808235128e4242c11e8ef25969 
> 
> Diff: https://reviews.apache.org/r/30225/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew -Pq build
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>


Re: Review Request 30225: Modifying update controller to support heartbeats.

Posted by Maxim Khutornenko <ma...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/
-----------------------------------------------------------

(Updated Feb. 7, 2015, 2:43 a.m.)


Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.


Bugs: AURORA-1010
    https://issues.apache.org/jira/browse/AURORA-1010


Repository: aurora


Description (updated)
-------

Added pulsing support into the JobUpdateController. The qualified coordinated updates get blocked until a pulse arrives. An update then becomes active and proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a terminal state (whichever comes first).

UPDATE:
- Added explicit states to capture "blocked" updates
- Refactored pulseUpdate() to not rely on DB state
- Dropped JobUpdatePulseState.PAUSED as it does not seem necessary.


Diffs (updated)
-----

  api/src/main/thrift/org/apache/aurora/gen/api.thrift 4f603f9e7ed004e53937a45ea8edf7241e15f5cf 
  src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 7f2ec71d744d5fafac84291cc79feba3398d0e1e 
  src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 
  src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java d479d20259f284528b2291e2e486b36d8e47fe5e 
  src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java 60f535988a20638fb16515d5027bfa65f1afb73c 
  src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java d3b30d48b76d8d7c64cda006a34f7ed3296526f2 
  src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java ec9d1e07abca1bf30262e1c0f741a34741e96f5d 
  src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 76460f95e058181b711fb6869f5a34c1d5bdfe31 
  src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml f9c9ceddc559b43b4a5c45c745d54ff47484edde 
  src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java ca7c0c2675477cc727ca006697665f997972dfde 
  src/test/java/org/apache/aurora/scheduler/testing/FakeScheduledExecutor.java 92cfa2b30c1c4daa3ae225fc8609fbeaecdaff7a 
  src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java 89765ac3d34a827d3748fb96a275d78e9d1b8b72 
  src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 41d422939209d0808235128e4242c11e8ef25969 

Diff: https://reviews.apache.org/r/30225/diff/


Testing
-------

./gradlew -Pq build


Thanks,

Maxim Khutornenko


Re: Review Request 30225: Modifying update controller to support heartbeats.

Posted by Maxim Khutornenko <ma...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/
-----------------------------------------------------------

(Updated Feb. 5, 2015, 2:34 a.m.)


Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.


Changes
-------

Bill's and David's feedback.


Bugs: AURORA-1010
    https://issues.apache.org/jira/browse/AURORA-1010


Repository: aurora


Description
-------

Added pulsing support into the JobUpdateController. The qualified coordinated updates get blocked until a pulse arrives. An update then becomes active and proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a terminal state (whichever comes first).

Not particularly happy with plumbing through OneWayJobUpdater but the alternative is a state machine change, which is much hairier and will require more changes in the JobUpdaterController. Going with the minimal diff here.


Diffs (updated)
-----

  api/src/main/thrift/org/apache/aurora/gen/api.thrift 4f603f9e7ed004e53937a45ea8edf7241e15f5cf 
  src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 7f2ec71d744d5fafac84291cc79feba3398d0e1e 
  src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 
  src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java d479d20259f284528b2291e2e486b36d8e47fe5e 
  src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java 60f535988a20638fb16515d5027bfa65f1afb73c 
  src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java d3b30d48b76d8d7c64cda006a34f7ed3296526f2 
  src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java ec9d1e07abca1bf30262e1c0f741a34741e96f5d 
  src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 76460f95e058181b711fb6869f5a34c1d5bdfe31 
  src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 
  src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml f9c9ceddc559b43b4a5c45c745d54ff47484edde 
  src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java ca7c0c2675477cc727ca006697665f997972dfde 
  src/test/java/org/apache/aurora/scheduler/testing/FakeScheduledExecutor.java 92cfa2b30c1c4daa3ae225fc8609fbeaecdaff7a 
  src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java 89765ac3d34a827d3748fb96a275d78e9d1b8b72 
  src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 41d422939209d0808235128e4242c11e8ef25969 
  src/test/java/org/apache/aurora/scheduler/updater/OneWayJobUpdaterTest.java 7d0a7438b4a517e5e0d44f4e99aceb1a6d19f987 

Diff: https://reviews.apache.org/r/30225/diff/


Testing
-------

./gradlew -Pq build


Thanks,

Maxim Khutornenko


Re: Review Request 30225: Modifying update controller to support heartbeats.

Posted by Aurora ReviewBot <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/#review70992
-----------------------------------------------------------

Ship it!


Master (8bcb2ba) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot retry"

- Aurora ReviewBot


On Feb. 4, 2015, 5:24 p.m., Maxim Khutornenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30225/
> -----------------------------------------------------------
> 
> (Updated Feb. 4, 2015, 5:24 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.
> 
> 
> Bugs: AURORA-1010
>     https://issues.apache.org/jira/browse/AURORA-1010
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Added pulsing support into the JobUpdateController. The qualified coordinated updates get blocked until a pulse arrives. An update then becomes active and proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a terminal state (whichever comes first).
> 
> Not particularly happy with plumbing through OneWayJobUpdater but the alternative is a state machine change, which is much hairier and will require more changes in the JobUpdaterController. Going with the minimal diff here.
> 
> 
> Diffs
> -----
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 4f603f9e7ed004e53937a45ea8edf7241e15f5cf 
>   src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 7f2ec71d744d5fafac84291cc79feba3398d0e1e 
>   src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 
>   src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java d479d20259f284528b2291e2e486b36d8e47fe5e 
>   src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java 60f535988a20638fb16515d5027bfa65f1afb73c 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java d3b30d48b76d8d7c64cda006a34f7ed3296526f2 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java ec9d1e07abca1bf30262e1c0f741a34741e96f5d 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 76460f95e058181b711fb6869f5a34c1d5bdfe31 
>   src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 
>   src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml f9c9ceddc559b43b4a5c45c745d54ff47484edde 
>   src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java ca7c0c2675477cc727ca006697665f997972dfde 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java 89765ac3d34a827d3748fb96a275d78e9d1b8b72 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 41d422939209d0808235128e4242c11e8ef25969 
>   src/test/java/org/apache/aurora/scheduler/updater/OneWayJobUpdaterTest.java 7d0a7438b4a517e5e0d44f4e99aceb1a6d19f987 
> 
> Diff: https://reviews.apache.org/r/30225/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew -Pq build
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>


Re: Review Request 30225: Modifying update controller to support heartbeats.

Posted by Maxim Khutornenko <ma...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/
-----------------------------------------------------------

(Updated Feb. 4, 2015, 5:24 p.m.)


Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.


Changes
-------

Fixing method name escaped during rebase.


Bugs: AURORA-1010
    https://issues.apache.org/jira/browse/AURORA-1010


Repository: aurora


Description
-------

Added pulsing support into the JobUpdateController. The qualified coordinated updates get blocked until a pulse arrives. An update then becomes active and proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a terminal state (whichever comes first).

Not particularly happy with plumbing through OneWayJobUpdater but the alternative is a state machine change, which is much hairier and will require more changes in the JobUpdaterController. Going with the minimal diff here.


Diffs (updated)
-----

  api/src/main/thrift/org/apache/aurora/gen/api.thrift 4f603f9e7ed004e53937a45ea8edf7241e15f5cf 
  src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 7f2ec71d744d5fafac84291cc79feba3398d0e1e 
  src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 
  src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java d479d20259f284528b2291e2e486b36d8e47fe5e 
  src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java 60f535988a20638fb16515d5027bfa65f1afb73c 
  src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java d3b30d48b76d8d7c64cda006a34f7ed3296526f2 
  src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java ec9d1e07abca1bf30262e1c0f741a34741e96f5d 
  src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 76460f95e058181b711fb6869f5a34c1d5bdfe31 
  src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 
  src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml f9c9ceddc559b43b4a5c45c745d54ff47484edde 
  src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java ca7c0c2675477cc727ca006697665f997972dfde 
  src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java 89765ac3d34a827d3748fb96a275d78e9d1b8b72 
  src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 41d422939209d0808235128e4242c11e8ef25969 
  src/test/java/org/apache/aurora/scheduler/updater/OneWayJobUpdaterTest.java 7d0a7438b4a517e5e0d44f4e99aceb1a6d19f987 

Diff: https://reviews.apache.org/r/30225/diff/


Testing
-------

./gradlew -Pq build


Thanks,

Maxim Khutornenko


Re: Review Request 30225: Modifying update controller to support heartbeats.

Posted by Maxim Khutornenko <ma...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/
-----------------------------------------------------------

(Updated Feb. 4, 2015, 2:29 a.m.)


Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.


Changes
-------

Rebased.


Bugs: AURORA-1010
    https://issues.apache.org/jira/browse/AURORA-1010


Repository: aurora


Description
-------

Added pulsing support into the JobUpdateController. The qualified coordinated updates get blocked until a pulse arrives. An update then becomes active and proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a terminal state (whichever comes first).

Not particularly happy with plumbing through OneWayJobUpdater but the alternative is a state machine change, which is much hairier and will require more changes in the JobUpdaterController. Going with the minimal diff here.


Diffs (updated)
-----

  api/src/main/thrift/org/apache/aurora/gen/api.thrift 4f603f9e7ed004e53937a45ea8edf7241e15f5cf 
  src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 7f2ec71d744d5fafac84291cc79feba3398d0e1e 
  src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 
  src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java d479d20259f284528b2291e2e486b36d8e47fe5e 
  src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java 60f535988a20638fb16515d5027bfa65f1afb73c 
  src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java d3b30d48b76d8d7c64cda006a34f7ed3296526f2 
  src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java ec9d1e07abca1bf30262e1c0f741a34741e96f5d 
  src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 76460f95e058181b711fb6869f5a34c1d5bdfe31 
  src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 
  src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml f9c9ceddc559b43b4a5c45c745d54ff47484edde 
  src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java ca7c0c2675477cc727ca006697665f997972dfde 
  src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java 89765ac3d34a827d3748fb96a275d78e9d1b8b72 
  src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 41d422939209d0808235128e4242c11e8ef25969 
  src/test/java/org/apache/aurora/scheduler/updater/OneWayJobUpdaterTest.java 7d0a7438b4a517e5e0d44f4e99aceb1a6d19f987 

Diff: https://reviews.apache.org/r/30225/diff/


Testing
-------

./gradlew -Pq build


Thanks,

Maxim Khutornenko


Re: Review Request 30225: Modifying update controller to support heartbeats.

Posted by Aurora ReviewBot <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/#review70904
-----------------------------------------------------------


This patch does not apply cleanly on master (9fe6d54), do you need to rebase?

I will refresh this build result if you post a review containing "@ReviewBot retry"

- Aurora ReviewBot


On Feb. 4, 2015, 2:03 a.m., Maxim Khutornenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30225/
> -----------------------------------------------------------
> 
> (Updated Feb. 4, 2015, 2:03 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.
> 
> 
> Bugs: AURORA-1010
>     https://issues.apache.org/jira/browse/AURORA-1010
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Added pulsing support into the JobUpdateController. The qualified coordinated updates get blocked until a pulse arrives. An update then becomes active and proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a terminal state (whichever comes first).
> 
> Not particularly happy with plumbing through OneWayJobUpdater but the alternative is a state machine change, which is much hairier and will require more changes in the JobUpdaterController. Going with the minimal diff here.
> 
> 
> Diffs
> -----
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 4f603f9e7ed004e53937a45ea8edf7241e15f5cf 
>   src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 7f2ec71d744d5fafac84291cc79feba3398d0e1e 
>   src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 
>   src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java d479d20259f284528b2291e2e486b36d8e47fe5e 
>   src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java 60f535988a20638fb16515d5027bfa65f1afb73c 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java d3b30d48b76d8d7c64cda006a34f7ed3296526f2 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java a992938d4e12b20f81608be6bbdc24c0a211c3fd 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 76460f95e058181b711fb6869f5a34c1d5bdfe31 
>   src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 
>   src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml f9c9ceddc559b43b4a5c45c745d54ff47484edde 
>   src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java ca7c0c2675477cc727ca006697665f997972dfde 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java 89765ac3d34a827d3748fb96a275d78e9d1b8b72 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 4c827b183a87b4d97774edbfaa960bd1c3de72a5 
>   src/test/java/org/apache/aurora/scheduler/updater/OneWayJobUpdaterTest.java 7d0a7438b4a517e5e0d44f4e99aceb1a6d19f987 
> 
> Diff: https://reviews.apache.org/r/30225/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew -Pq build
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>