You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@aurora.apache.org by Jordan Ly <jo...@gmail.com> on 2018/02/28 02:11:56 UTC

Review Request 65824: Second try at fixing cron id collision bug by avoiding state in Quartz jobs

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65824/
-----------------------------------------------------------

Review request for Aurora, David McLaughlin, Renan DelValle, and Santhosh Kumar Shanmugham.


Repository: aurora


Description
-------

I had to revert the previous fix due to a bug around `JobDataMap` and manipulating it outside of a Quartz execution. See https://reviews.apache.org/r/65810/ for context.

I believe that we can mitigate this by creating an `AtomicBoolean` and manipulating that within the `batchWorker` execution as opposed to directly changing the `JobDataMap`. This mirrors the previous behavior of `killFollowups` but while keeping the state within a given job context.


Diffs
-----

  src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java 3604dd4c84043def3fe098ca0f87bf8bab99b3db 
  src/test/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJobTest.java 5083fcd407a069617634b82db1ad799b5d4d4551 


Diff: https://reviews.apache.org/r/65824/diff/1/


Testing
-------

`./gradlew test`

Manually verified bug in previous patch no longer occurs. Verified the bug the previous patch was fixing no longer occurs as well.
Will deploy to a test cluster.

Question for reviewers: Is it worthwhile to add end-to-end tests for scheduling crons and ensuring KILL_EXISTING works? The reason I am hesitant is that the coverage of this feature seem comprehensive enough but the bug that slipped through the previous patch was due to a subtle interaction with the Quartz scheduler which we mock in unit tests.


Thanks,

Jordan Ly


Re: Review Request 65824: Second try at fixing cron id collision bug by avoiding state in Quartz jobs

Posted by Jordan Ly <jo...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65824/#review198371
-----------------------------------------------------------




src/test/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJobTest.java
Lines 158-160 (original)
<https://reviews.apache.org/r/65824/#comment278509>

    Reviewers note: I found that this was extraneous so I removed it.


- Jordan Ly


On Feb. 28, 2018, 2:11 a.m., Jordan Ly wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65824/
> -----------------------------------------------------------
> 
> (Updated Feb. 28, 2018, 2:11 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Renan DelValle, and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> I had to revert the previous fix due to a bug around `JobDataMap` and manipulating it outside of a Quartz execution. See https://reviews.apache.org/r/65810/ for context.
> 
> I believe that we can mitigate this by creating an `AtomicBoolean` and manipulating that within the `batchWorker` execution as opposed to directly changing the `JobDataMap`. This mirrors the previous behavior of `killFollowups` but while keeping the state within a given job context.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java 3604dd4c84043def3fe098ca0f87bf8bab99b3db 
>   src/test/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJobTest.java 5083fcd407a069617634b82db1ad799b5d4d4551 
> 
> 
> Diff: https://reviews.apache.org/r/65824/diff/1/
> 
> 
> Testing
> -------
> 
> `./gradlew test`
> 
> Manually verified bug in previous patch no longer occurs. Verified the bug the previous patch was fixing no longer occurs as well.
> Will deploy to a test cluster.
> 
> Question for reviewers: Is it worthwhile to add end-to-end tests for scheduling crons and ensuring KILL_EXISTING works? The reason I am hesitant is that the coverage of this feature seem comprehensive enough but the bug that slipped through the previous patch was due to a subtle interaction with the Quartz scheduler which we mock in unit tests.
> 
> 
> Thanks,
> 
> Jordan Ly
> 
>


Re: Review Request 65824: Second try at fixing cron id collision bug by avoiding state in Quartz jobs

Posted by Aurora ReviewBot <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65824/#review198373
-----------------------------------------------------------


Ship it!




Master (e3f496a) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot retry"

- Aurora ReviewBot


On Feb. 28, 2018, 2:11 a.m., Jordan Ly wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65824/
> -----------------------------------------------------------
> 
> (Updated Feb. 28, 2018, 2:11 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Renan DelValle, and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> I had to revert the previous fix due to a bug around `JobDataMap` and manipulating it outside of a Quartz execution. See https://reviews.apache.org/r/65810/ for context.
> 
> I believe that we can mitigate this by creating an `AtomicBoolean` and manipulating that within the `batchWorker` execution as opposed to directly changing the `JobDataMap`. This mirrors the previous behavior of `killFollowups` but while keeping the state within a given job context.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java 3604dd4c84043def3fe098ca0f87bf8bab99b3db 
>   src/test/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJobTest.java 5083fcd407a069617634b82db1ad799b5d4d4551 
> 
> 
> Diff: https://reviews.apache.org/r/65824/diff/1/
> 
> 
> Testing
> -------
> 
> `./gradlew test`
> 
> Manually verified bug in previous patch no longer occurs. Verified the bug the previous patch was fixing no longer occurs as well.
> Will deploy to a test cluster.
> 
> Question for reviewers: Is it worthwhile to add end-to-end tests for scheduling crons and ensuring KILL_EXISTING works? The reason I am hesitant is that the coverage of this feature seem comprehensive enough but the bug that slipped through the previous patch was due to a subtle interaction with the Quartz scheduler which we mock in unit tests.
> 
> 
> Thanks,
> 
> Jordan Ly
> 
>


Re: Review Request 65824: Second try at fixing cron id collision bug by avoiding state in Quartz jobs

Posted by Renan DelValle <re...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65824/#review198503
-----------------------------------------------------------



Adding an end to end test would be the way to go since some subtle details (that are difficult to mock) let the previous patch pass without setting off any alarms.

Other than that, if I'm understanding correctly what the patch is trying to accomplish, looks like a good solution.

- Renan DelValle


On Feb. 27, 2018, 6:11 p.m., Jordan Ly wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65824/
> -----------------------------------------------------------
> 
> (Updated Feb. 27, 2018, 6:11 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Renan DelValle, and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> I had to revert the previous fix due to a bug around `JobDataMap` and manipulating it outside of a Quartz execution. See https://reviews.apache.org/r/65810/ for context.
> 
> I believe that we can mitigate this by creating an `AtomicBoolean` and manipulating that within the `batchWorker` execution as opposed to directly changing the `JobDataMap`. This mirrors the previous behavior of `killFollowups` but while keeping the state within a given job context.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java 3604dd4c84043def3fe098ca0f87bf8bab99b3db 
>   src/test/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJobTest.java 5083fcd407a069617634b82db1ad799b5d4d4551 
> 
> 
> Diff: https://reviews.apache.org/r/65824/diff/1/
> 
> 
> Testing
> -------
> 
> `./gradlew test`
> 
> Manually verified bug in previous patch no longer occurs. Verified the bug the previous patch was fixing no longer occurs as well.
> Will deploy to a test cluster.
> 
> Question for reviewers: Is it worthwhile to add end-to-end tests for scheduling crons and ensuring KILL_EXISTING works? The reason I am hesitant is that the coverage of this feature seem comprehensive enough but the bug that slipped through the previous patch was due to a subtle interaction with the Quartz scheduler which we mock in unit tests.
> 
> 
> Thanks,
> 
> Jordan Ly
> 
>